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Preface 



Static analysis is increasingly recognized as a fundamental reasearch area aimed 
at studying and developing tools for high performance implementations and ver- 
ification systems for all programming language paradigms. The last two decades 
have witnessed substantial developments in this field, ranging from theoretical 
frameworks to design, implementation, and application of analyzers in optimiz- 
ing compilers. 

Since 1994, SAS has been the annual conference and forum for researchers 
in all aspects of static analysis. This volume contains the proceedings of the 6th 
International Symposium on Static Analysis (SAS’99) which was held in Venice, 
Italy, on 22-24 September 1999. The previous SAS conferences were held in 
Namur (Belgium), Glasgow (UK), Aachen (Germany), Paris (France), and Pisa 
(Italy). 

The program committee selected 18 papers out of 42 submissions on the basis 
of at least three reviews. The resulting volume offers to the reader a complete 
landscape of the research in this area. The papers contribute to the following 
topics: foundations of static analysis, abstract domain design, and applications 
of static analysis to different programming paradigms (concurrent, synchronous, 
imperative, object oriented, logical, and functional). In particular, several papers 
use static analysis for obtaining state space reduction in concurrent systems. New 
application fields are also addressed, such as the problems of security and secrecy. 

In addition to these high quality technical papers, SAS’99 included in its pro- 
gram several outstanding invited speakers. Daniel Weise, Dennis Volpano, David 
MacAllester, Don Sannella, David Schmidt, Mary Lou Soffa, and Graig Gham- 
bers accepted our invitation to give invited talks or tutorials. Their contributions 
are also included in this volume. 

In general, it is clear to us that the role of static analysis is bound to become 
more and more important in the future due to the enormous popularity of the 
Internet. For the latter requires the construction of increasingly complex software 
systems for which efficiency but also security are crucial issues. 

The staff of the department of computer science at Ga’ Foscari University 
and that of the department of mathematics at Padova University were extremely 
helpful in handling all aspects of the symposium. 

Special thanks are also due to the institutions that sponsored the event: 
EAPLS, ALP, GNR, Gompulog, Ga’ Foscari University, and Padova University. 
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A Formal Study of Slicing for Multi-threaded 
Programs with JVM Concurrency Primitives* 



John Hatcliff^, James Corbett^, Matthew Dwyer^, Stefan Sokolowski^, and 

Hongjun Zheng^ 

^ SAnToS Laboratory, Kansas State University* * * 

^ University of Hawaii^ 



Abstract. Previous work has shown that program slicing can be a useful 
step in model-checking software systems. We are interested in applying 
these techniques to construct models of multi-threaded Java programs. 
Past work does not address the concurrency primitives found in Java, 
nor does it provide the rigorous notions of slice correctness that are nec- 
essary for reasoning about programs with non-deterministic behaviour 
and potentially infinite computation traces. 

In this paper, we define the semantics of a simple multi-threaded lan- 
guage with concurrency primitives matching those found in the Java 
Virtual Machine, we propose a bisimulation-based notion of correctness 
for slicing in this setting, we identify notions of dependency that are 
relevant for slicing multi-threaded Java programs, and we use these de- 
pendencies to specify a program sheer for the language presented in the 
paper. Finally, we discuss how these dependencies can be refined to take 
into account common programming idioms of concurrent Java software. 



1 Introduction 

Program slicing is a program reduction technique that has been widely applied 
in software engineering, static analysis, and debugging applications [12]. In pre- 
vious work [3], we showed how backward static slicing can be used as a com- 
ponent in constructing finite-state models of sequential software systems. Exist- 
ing model-checking tools can check automatically these models against software 
specifications (written in various temporal logics). The main idea is that slicing 
can throw away portions of the program that are irrelevant to the specification 
being verified. This often reduces the size of the software model and reachable 
state-space. Thus, it often reduces the time required for model-checking. 

The previous work provides part of the foundational theory for a set of tools 
that we are building for model-checking Java programs called the Bandera Tool 
Set. In essence, Bandera is a pipeline of tools that compile Java source code to 
inputs of existing model-checking tools such as SMV [9] and SPIN [5] . 

* This work supported in part by NSF under grants CCR-9633388, CCR-9703094, 
CCR-9708184, and CCR-9701418 and DARPA/NASA under grant NAG 21209. 

*** 234 Nichols Hall, Manhattan KS, 66506, USA. {hatcliff ,dwyer,stefan, 
zhengjOcis . ksu . edu 
^ corbettOmit . edu 
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The goal of the present work is to scale up our previous slicing-based model- 
construction techniques to the concurrency mechanisms of Java. We give a for- 
mal semantics for a small core language that contains the concurrency primitives 
found in the Java Virtual Machine (Section 2). We describe how the notion of 
weak-bisimulation from concurrency theory can be adapted to define a notion of 
correctness for slicing that is relevant for multi-threaded programs with infinite 
execution traces (Section 3). Building on various notions of dependency found in 
earlier work on slicing sequential and multi-thread programs, we introduce ad- 
ditional notions of dependency that arise when considering Java’s concurrency 
primitives, and dependencies that arise when trying to preserve the semantics 
of programs with infinite execution traces. We discuss how these dependencies 
stem from assumptions that may be overly pessimistic when one considers con- 
structing finite-state models for typical Java source code, and explain how these 
dependencies can be refined (Section 4). After a discussion of related work (Sec- 
tion 5) , we conclude with a brief discussion of our experiments with a prototype 
implementation applied to concurrent Java software (Section 6). 

We also refer the reader to the extended version of this paper which contains 
additional examples, more discussion about the connections to model-checking, 
and details concerning correctness proofs. The extended paper and more infor- 
mation about the Bandera project can be found at 
http : //www . cis . ksu . edu/ satritos/bandera. 

2 Concurrent FCL 

Bandera is built on top of the Soot Java compiler framework developed by Laurie 
Hendren’s Sable group at the University of McGill. In the Soot framework, Java 
programs are translated to an intermediate language called Jimple. Jimple is es- 
sentially a language of control-flow graphs where (a) statements appear in three- 
address-code form (the explicit stack manipulation inherent in JVM instructions 
has been removed by introducing temporary variables), and (b) various Java 
constructs such as method invocations and synchronized statements are repre- 
sented in terms of their virtual machine counterparts (such as invokevirtual, 
and monitorenter, monitorexit). 

For our formal study of slicing in this setting, we define a language called 
CFCL (Concurrent Flowchart Language) that captures the essence of Jimple 
control-flow graphs and focuses tightly on the JVM instructions for thread syn- 
chronization including wait, notify, notifyall (start and join are treated 
in the extended version of the paper). For simplicity, we consider evaluation of 
assignment statements to be atomic. Thus, we do not treat the two-level storage 
model for JVM threads [8, Ch. 8]. Due to the focus on concurrency issues, we 
omit features such as method calls, dynamic object creation and exceptions. 

Figure 1 presents the syntax of CFCL, and Figure 2 gives a simple CFCL 
program. A CFCL program begins with a declaration of variables x* (each vari- 
able is implicitly typed as nat). To model the implicit lock associated with each 
Java object, the variable list is followed by a list of lock identifiers k* that can 
be used in synchronization primitives. 
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P 

d 

b 



p G Programs 


a G Assignments 


t G Thread-identifiers 


d G Threads 


s G Syncs 


e G Expressions 


b G Blocks 


X G Variables 


c G Constants 


1 G Block-Labels 


fc G Locks 


j G Jumps 
0 G Operations 


:= (r*) (fc*) 







— begin-thread t (1) 6^ end-thread; 

— I : a* j I I : s goto t, 



a ::= x := e; 

s ::= enter-monitor fc; | exit-monitor fc; | wait fc; | notify fc; | notify-all fc; 

e ::= c \ x \ o(e*) 

j ::= goto Z; | return; | if e then Zi else Z 2 ; 



Fig. 1. Syntax of CFCL 



The body of a CFCL-program is a series d~^ of one or more thread decla- 
rations. A thread declaration consists of a thread identifier t and the label I of 
the initial block to be executed within the thread, followed by a list b'^ of basic 
blocks. 

There are two kinds of basic blocks: (1) a block containing a list of assign- 
ments followed by a jump, or (2) a synchronization block containing a single 
synchronization statement and an unconditional jump. The assignment state- 
ments operate on variables shared between all threads. Each synchronization 
construct operates with respect to a particular lock k (the semantics will be 
explained) below. For conditional tests, any non-zero value represents true and 
zero represents false. The target of jump j must appear in the same thread as j. 

In the presentation of slicing, we need to reason about nodes in a statement- 
level control-flow graph (i.e., a graph where there is a separate node n for each 
assignment, synchronization statement, and jump) for given thread t. We will 
assume that each statement has a unique index i within each block, and that 
each block label is unique across the entire program. Then, a node n can be 
uniquely identified by a pair [l.i] where I is block label and i is an index value. 
In Figure 2, statement indices are given as annotations in brackets [•] (ignore 
the * annotations for now — they are used to indicate the results of slicing later 
on). For example, the first assignment in the prod-put block has the unique 
identifier (or node number) [prod-put. 1]. 

A flow graph G = {N, E, s, e) consists of a set N of statement nodes, a set E 
of directed control-flow edges, a unique start node s, and unique end node e such 
that all nodes in N are reachable from s, and e is reachable from all nodes in 
N. Node n dominates node m in G (written dom{n,m)) if every path from the 
start node s to m passes through n (note that this makes the dominates relation 
reflexive). Node n post-dominates node m in G (written post-dom{n, m)) if every 
path from node m to the end node e passes through n. A CFG hack edge is an 
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(count total prod cons) /* 


shared 


variables */ 




(buffer tally) /* 


lock identifiers */ 




/* producer code */ 




/* consumer code */ 




begin-thread producer 




begin-thread consumer 




(prod_obtain_lock) 




(cons_obtain_lock) 




prod_obtain_lock : 




cons_obtain_lock : 




enter-monitor buffer; 


[1] 


enter-monitor buffer; 


[1] 


goto prod_check; 


[2] 


goto cons_check; 


[2] 


prod_check : 




cons_check: 




if =(count 2) 


[1] 


if = (count 0) 


[1] 


then prod_wait 




then cons_wait 




else prod_put; 




else cons_get; 




prod_wait : 




cons_wait : 




wait buffer; 


[1] 


wait buffer; 


[1] 


goto prod_check; 


[2] 


goto cons_check; 


[2] 


prod_put : 




cons_get : 




count := +(count 1); 


[1] 


count := -(count 1); 


[1] 


if =( count 1) 


[2] 


if =(count 1) 


[2] 


then prod_wakeup 




then cons_wakeup 




else prod_release_lock: 




else cons_release_lock; 




prod_wakeup : 




cons_wakeup: 




notify buffer; 


[1] 


notify buffer; 


[1] 


goto prod_release_lock; 


[2] 


goto cons_release_lock; 


[2] 


prod_release_lock : 




cons_release_lock : 




exit-monitor buffer; 


[1] 


exit-monitor; 


[1] 


goto prod_enter_tally ; 


[2] 


goto cons_enter_tally ; 


[2] 


prod_enter_tally : 


* 


cons_enter_tally : 


* 


enter-monitor tally; 


[1]=^ 


enter-monitor tally; 


[1]=^ 


goto prod_update_tally ; 


[2]=t^ 


goto cons_update_tally ; 


L21* 


prod_update_tally : 


* 


cons_update_tally : 


* 


total := +(total 1); 


[1]=^ 


total := +(total 1); 


[1]* 


goto prod_exit_tally ; 


[2]=t^ 


goto cons_exit_tally ; 


[2]* 


prod_exit_tally : 


* 


cons_exit_tally : 


* 


exit-monitor tally; 


[1]=^ 


exit-monitor tally; 


[1]* 


goto prod_cycle; 


[2]=t^ 


goto cons_cycle; 


[2]* 


prod_cycle : 




cons_cycle : 




prod := +(prod 1); 


[1] 


cons := +(cons 1) ; 


[1]=^ 


if 1 


[2] 


if 1 


[2] 


then prod_obtain_lock 




then cons_obtain_lock 




else exit_prod; 




else exit_cons; 




exit_prod: return; 


[1] 


exit_cons: return; 


[1] 


end-thread; 




end-thread; 




Fig. 2. A CFCL producer /consumer control system with buffer size 2 
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edge whose target dominates its source in G [11, p. 191]. Given a back edge 
TO — >■ n in G, the natural loop of to — >■ n is the subgraph consisting of the set of 
nodes containing n and all nodes from which to can be reached in the flowgraph 
without passing through n and the edge set connecting all the nodes in the 
subgraph [11, p. 191]. Node n is called the loop header. A CFG G = {N, E, s, e) 
is reducible (or well-structured) if E can be partitioned into disjoint sets Ef (the 
forward edge set) and Ef, (the back edge set) such that (N,Ef) forms a DAG 
in which each node can be reached from the entry node and all edges in Ei, are 
back edges in G. Muchnick [11, p. 196] notes that this implies that in a reducible 
flowgraph all the loops are natural loops characterized by their back edges and 
vice versa. It follows from the definition of reducible flowgraph that there are no 
jumps into the middle of loops - each loop is entered only through its header. 

The syntax of Jimple allows arbitrary control-flow because it was originally 
designed to be the target of a byte-code decompiler. However, we are using 
it as the target of a Java compiler front-end. Thus we can impose several con- 
straints on the control-flow structure of each CFCL thread corresponding to con- 
straints/properties of Java source code. Since Java only allows well-structured 
control- flow (no goto’s), the control structure of each GFGL thread is required 
to form a reducible flow graph. Since the enter-monitor k and exit-monitor k 
constructs arise from compiling Java synchronized blocks, we assume that 
enter- monitor k and exit- monitor k come in “matching pairs” . In addition, 
they are the unique start node and unique end node of a sub-flow-graph ap- 
pearing in the containing thread’s flow-graph. Thus, for each monitor delimiter 
we are able to a obtain a GFG corresponding to the monitor’s critical region. 
Based on this condition, we define a function CR that maps each node n to the 
inner-most critical region in which it appears. That is, if CR(n) = (toi,TO 2 ) 
then TOi is a enter-monitor k command with a matching exit-monitor k at 
to -2 and these two nodes form the inner-most critical region in which n appears. 
Our compiler front-end annotates each conditional with the type of source con- 
struct that gave rise to it. For example, we can tell if a conditional arises from 
a conditional in the source code, or if it is the exit conditional of loop {e.g., a 
for, or while). Since it is impossible to tell in general whether a Java loop will 
terminate (even for for loops), we will refer to all exit conditionals of loops as 
pre- divergence points. 

The remaining hitch in generating GFG’s satisfying these constraints is that 
some threads do not satisfy the “unique end node” property required by the 
definition of flowgraph. Specifically, the thread may (a) have no return, or (b) 
have multiple return’s. The first case appears if the thread contains one or 
more loops with no exit conditional (this guarantees an infinite loop). Since all 
our flowgraphs are reducible, each such loop is a natural loop (as defined above) 
uniquely characterized by its back edge. For simplicity, at such end nodes we will 
replace the nonconditional goto I jump associated with the back edge with the 
conditional jump if 1 then I else return. To work around the second case, we 
assume that when we extract the GFG from a thread t, we insert an additional 
node labeled halt that has no successors and its predecessors are all the return 
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V 

I 

a 

pc 

7 ^ 

B 

W 

C 



Values = {0, 1,2,.. .} 

Labels = Block-Labels U {• 

Stores = Variables ^ Values 

PCs = Thread-identifiers - 

Runnable = Thread-identifiers - 
Blocked-Sets = Locks x Thread-identifiers 
Wait-Tables = Locks x Thread-identifiers 



•} 

Nodes 
{ true, false} 

Lock-Counts 
Lock-Counts 



Lock- Tables = Locks [FCL]- 
Lock-Counts = {0, 1,2,.. .} 



■ (Thread-identifiers x {1, 2, 3, . . .} U {false}) 



Fig. 3. Operational semantics of CFCL programs (semantic values) 



nodes from t. Given a program p, we will refer to the set of thread CFGs from 
p as p’s GFG (technically, this is a misnomer since a set of control-flow graphs 
is not a control-flow graph itself). 

To access code at particular program points within a given GFGL-program 
p, we use the following functions. A code map function code maps a GFG node 
n to the code for the statement that it labels. For example, code( [prod-put. 1]) 
yields the assignment count ;= +( count 1);. A function first maps a label 
? of p to the first GFG node occurring in the block labeled 1. For example, 
/irst(prod-put) = [prod-put. 1]. A function def maps each node to the set of 
variables defined {i.e., assigned to) at that node (always a singleton or empty 
set), and ref maps each node to the set of variables referenced at node n. A 
thread map 9 maps a GFG node n to a thread identifier to which n belongs. For 
example, 0([prod-put.l]) = producer. 

Synchronization in Java is achieved using wait and notify monitors [8]. Fig- 
ure 2 illustrates the use of monitors to achieve proper synchronization in a simple 
producer/consumer system. Note that the example omits the actual manipula- 
tion of a shared buffer and instead gives only the code that properly controls 
buffer access. The implicit lock that would be associated with the Java buffer 
object is represented by the GFGL lock buffer. 

Gonsider the producer process of the example (recall that all variables have 
a default value of 0). The process begins by trying to acquire the buffer lock 
in prod_obtain_lock (until the process acquires the lock, it is held in a blocked 
set associated with the lock). Once the lock is aquired, the producer will check 
to see if the buffer is already full {i.e., if the buffer count is 2). If the buffer is 
full, the producer will release the lock and suspend itself by executing a wait 
instruction (this causes the process to be placed in a wait set associated with 
the lock); otherwise it will increment the count of items in the buffer. If the 
increment causes the buffer to move from a empty to a non-empty state, the 
producer notifies the consumer (which may have been waiting due to an empty 
buffer). When a process executes a notify k, it continues to hold the lock, but 
a process that has been waiting is moved from the wait set for k to the blocked 
set for k. Finally, the producer releases the buffer lock with the exit-monitor 
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instruction in block, and tries to obtain the tally lock. Once the tally lock 
is acquired, the producer increments the total number of produce/consume 
actions thus far, and then releases the lock. Finally, the producer increments 
the total number of produce actions thus far, and begins the cycle again. The 
actions of the consumer process are symmetric. 

To formalize these notions, the semantics of an CFCL program p is expressed 
as transitions on configurations of the form {pc, a, TZ, B, W, C). Figure 3 gives the 
formal definition for each of the configuration components. The program counter 
pc assigns to each thread t, the next node of t to be executed. The store cr maps 
each variable to a value. The run state table TZ maps each thread t to either 
true (signifying that t is in a runnable state — its next instruction given by the 
pc can be immediately executed) or false (signifying that t is blocked waiting 
to gain access to a lock, or it has suspended itself via a wait instruction, or 
it has successfully terminated by executing a return instruction). The blocked 
table B assigns to each pair {k, t), the number of times thread t needs to acquire 
lock k once it is unblocked. Following the semantics of the JVM, each thread 
may acquire a lock multiple times [8, p. 376]. B{k,t) = 0 signifies that thread t 
is not blocked on lock k. The lock waiting table >V assigns to each pair (fc, t), the 
number of times thread t needs to acquire lock k once it is awakened. W{k,t) = 0 
signifies that thread t is not waiting on lock k. The lock table C maps each held 
lock fc to a pair {t, n) where t is the thread that currently holds k, and n is the 
number of times that t has acquired the lock (z. e., n is the difference between 
the number of preceding enter-monitor k and exit-monitor k instructions 
executed by t). If no thread holds lock k, then £{k) = false. Formally, the set of 
configurations is defined as follows 

Configurations = (PCs x Stores x Runnable x Blocked-Sets 
X Wait-Tables x Lock-Tables) U {bad-monitor} 

Our set of configurations includes an exception configuration bad-monitor that 
is meant to model the raising of an IllegalMonitorState-Exception by the 
JVM [8, p. 376]. The semantics will be defined so that no transitions are possible 
once an exception configuration has been reached. 

Figure 4 gives rules that define a CFCL-program indexed transition relation 
I — >p on configurations. The program p that indexes the relation gives rise to the 
code map code and thread identifiers used in the semantics. We will omit the 
program index when writing the relation since it will always be clear from the 
context. 

Execution of a program p begins with an initial configuration {pc, a,TZ, B, W, £). 
where \/fPc{t) = Umit where Uinit is the start node in the CFG for t, \/x.cr{x) = 0, 
V( . TZ{t) = true, Wk,t ■ B{k, t) = 0, . W{k, t) = 0, and . C{k) = false. 

In general, a transition occurs when there exists a runnable thread as indi- 
cated by TZ. A runnable thread t is arbitrarily chosen (giving rise to nondeter- 
minism) and the program counter pc and code map are consulted to find the next 
command of t to evaluate. There is a separate transition rule for each syntactic 
category of command. Each rule relies on an auxiliary judgement that defines 
the semantics for the particular command category. 
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Fig. 4. Operational semantics of CFCL programs (transitions) 



The intuition behind the rules for these judgements is as follows, cr e 

V means that under store a, expression e in f evaluates to value v. Note that 
expression evaluation cannot change the value of the store, a \~assign cl ^ a' 
means that under store a, the assignment a yields the updated store a', a \~jump 
j I means that under the store a, jump j will cause a transition to the block 
labeled 1. TZ,B,yV,C \~lync s => TZ' ,B' ,C means that, given the run state 
map TZ, blocked set map B, wait set map >V, and lock map L, execution of 
the synchronization statement s in thread f may produce updated versions of 
each of these structures. Figure 5 gives the rules that define the relation for 
synchronization statements (the rules for expressions, assignments, and jumps 
are straightforward and are omitted). 

For enter-monitor k when lock k is not held, the lock table for k is changed 
to show that thread t has acquired the lock once. When the lock is held already 
by f, its acquisition count is incremented. When another thread t' holds the lock, 
t is placed in the blocked set (with its current acquisition count), and its run 
status is changed to “not runnable” . 

For exit-monitor k, if thread f’s lock count for k is greater than one, the 
count is simply decremented (the lock is not released). If thread f’s lock count 
for k is equal to one, then the lock is relinquished. If the set of threads blocked 
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C{k) = false 

TZ, B, W, £ \~lync enter-monitor k ^ TZ, B, W, £[k {t, 1)] 
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where s G {exit-monitor k, wait k, notify k, notify-all k} 



Fig. 5. Operational semantics of CFCL programs (synchronization statements) 
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on k is non-empty, then another thread t' is chosen non-deterministically from 
the blocked set and is given the lock. 

For wait k, if thread t holds the lock, then it is relinquished and t is placed in 
the wait set (along with its lock count) . Another thread t' is non-deterministically 
chosen from those blocked on k and is given the lock (and its run state is set to 
runnable). 

For notify k, if thread t holds the lock and the wait set is empty, then the 
command has the same effect as a no-op. If the wait set is not empty, one thread 
(and associated lock count) is non-deterministically removed from the wait set 
and added to the block set. 

For notify-all/c, if thread t holds the lock and the wait set is empty, then the 
command has the same effect as a no-op (the current block set is unioned with 
the empty set). If the wait set is not empty, then all threads (and associated lock 
count) are removed from the wait set and added to the block set. 

Finally, if a thread executes a exit-monitor, rawwait, notify, notify-all 
on lock k which it does not holdk, then an exception is raised [8, p. 376]. 

Returning to the definition of transitions, the “normal final configuration” is 
where the run state of all threads is false, and where all thread program counters 
yield halt. ^From Figure 4 it is clear that each transition ci i — >■ C 2 executes 
exactly one command at some node n. We write c\ i — > C 2 when ci i — > C 2 by 

N 

executing the command at node n, ci i — > C 2 when ci i — > C 2 by executing the 
command at some node n G N, and Ci C 2 when ci i — > by executing the 
command at some node n N . 

3 Correctness of Multi-threaded Program Slicing 

A program slice consists of the parts of a program p that (potentially) affect the 
variable values that are referenced at some program points of interest [12]. The 
program “points of interest” traditionally are called the slicing criterion. 

Definition 1 (slicing criterion). A slicing criterion C for a program p is a 
non-empty set of nodes {ni, . . . , Uk} where each Ui is a node in p’s statement 
flow-graph. 

In many cases in the slicing literature, the desired correspondence between 
the source program and the slice is not formalized, and this also leads to subtle 
differences between presentations. When a notion of “correct slice” is given, it 
is often stated using the notion of projection for finite deterministic {i.e., non- 
branching) execution traces. Clearly, when moving to the concurrent setting, 
notions of correctness need to be generalized to handle non-deterministic exe- 
cution as well as possibly infinite execution semantics of reactive programs that 
are designed to run indefinitely. 

For our model-checking applications, typically we are given some program 
p and some temporal logic specification ip to check. In the ideal case we would 
like the sheer to produce an executable program ps where 'if holds for ps if and 
only if if holds for p. The key point here is that the sheer needs only to preserve 




A Formal Study of Slicing for Multi-threaded Programs 



11 



the semantics for the parts of p that influence the satisfaction of ■ 0 . Intuitively, 
there are nodes/actions in the program that are observable with respect to ip 
(they influence the satisfaction of tp), but other nodes/actions that correspond 
to silent moves or non-observables with respect to ip. 

The discussion above suggests that appropriate notions of correctness for slic- 
ing concurrent programs can be derived from the notion of weak-bisimulation 
found in concurrency theory. Intuitively, the notion of projection can be gener- 
alized to a notion of weak-bisimulation where actions associated with the nodes 
in the slicing criterion C are considered “observable” ; otherwise the actions can 
be considered “silent” (analogous to r-moves in CCS [ 10 ]). 

The following definition (analogous to the CCS notion of derivative) describes 
the situation where a program can do zero or more non-observable moves, then 
one observable move, followed by zero or more non-observable moves. In addition, 
the definition forces the values for variables referenced at the observable node to 
match the values contained in a given store a. This will allow us to maintain a 
correspondence between the store manipulated by p and ps- 



Definition 2 ((n, (r)-derivative). Let C be a slicing criterion for p, and let 
n € C and a be a store where domain{a) = ref{n). Then define the relation 



Tt (T ^ 

Cl =^C C2 to hold when there exist configurations c'l, c'2 such that ci 1 — >■* 
-.C 

c'l I — > c'2 I — >■* C2 and for all x € reffn), o{x) = <j'i{x) where tj( is the store 
from configuration c'^. 



Based on the notion of (n, cr)-derivative, the following definition specifies what 
it means for two programs to simulate each other with respect to the observable 
actions given in the slicing criterion. 

Definition 3 (C-bisimulation). Given two programs p\ and p2, let C be a 
slicing criterion for p\ and C C N2 where N2 is the set of nodes in p2 ’s CFG. A 
binary relation S C Configurations[pi] x Configurations [^2] is a C-bisimulation 
if whenever (ci,C2) G S then 

(i) for any n € C and store a, if Ci =^c c'l in pi then there exists a c'2 such 
that C2 =^c c'2 in p2 and (c'i,c^ G S, and 
(ii) for any n € C and store a, if C2 =^c c'2 in p2 then there exists a c'l such 
that Cl =^c c'l in pi and (c'i,c^ G S. 

Two configurations are ( 7 -bisimilar if they are related by a C-bisimulation. 

Definition 4 (C-bisimilarity). Let C be a slicing criterion for pi andp2, and 
let Cl and C2 be p\ and p2 configurations, respectively. Then c\ and C2 are C- 
bisimilar (written c\ C2) if there exists a C-bisimulation S and (ci,C2) G S. 
In other words, = IJ {5 | S is a C-bisimulation}. 

Now, Ps is said to be a correct slice of p with respect to C if their initial config- 
urations are C-bisimilar. 




12 



J. Hatcliff et al. 



Definition 5 (program slice). Let C be a slicing criterion for p. Program 
Ps is a slice of p with respect to C if c Cg where c and Cg are the initial 
configurations for p andps, respectively. 

Note that if p consists of a single thread and has a finite execution trace, 
the notion of bisimulation gives the same relation between p and ps as the usual 
definition of projection {e.g., as originally defined by Weiser [13]). 

4 Computing Multi-threaded Program Slices 

Given a slicing criterion C = {ni, . . . ,Uk}, computing a slice of p involves 
finding the nodes in p upon which the statements at nodes Ui depend. These 
nodes are often referred to a relevant nodes. Relevant variables are variables that 
are defined or referenced at relevant nodes. In the slicing literature, constructing 
the set of relevant nodes is often performed in two stages. In the first stage, one 
builds a program dependence graph (PDG) that captures various dependencies 
between the nodes in p’s control-flow graph. In the second stage, the nodes 
upon which the Ui depend are found by computing the transitive closure of the 
dependences in the PDG with respect to the n^. 

We begin by describing each of the types of dependence relations that are re- 
quired for slicing GFGL programs. Data-dependence [12] is related to the notion 
of reaching definition: a node n is data-dependent on node m if, for a variable v 
referenced at n, a definition of (z.e., an assignment to) v at m reaches n. Thus, 
node n depends on node m because the assignment at m can influence a value 
computed at n. 

Definition 6 (data dependence). Node n is data-dependent on m (written 

n ^ m) if there is a variable v such that (1) there exists a non-trivial path p 
from m to n such that for every node m' € p — {m,n}, v ^ def(jn'), and (2) 
v € def{m) fl ref{n). 

For example, In Figure 2 [prod_put.2] ^ [prod_put.l], [prod_check. 1] 
[prod_put . 1], [prod_put.l] ^ [prod_put.l], [prod_update_tally . 1] ^ 

[prod_update_tally . 1], [prod_cycle.l] ^ [prod_cycle.l], and similarly for the 
consumer thread. 

Gontrol dependence [12] information identifies the conditionals that may af- 
fect execution of a node in the slice. 

Definition 7 (control dependence). Node n is control-dependent on m (writ- 

ten n — >■ m) if (1) there exists a non-trivial path p from m to n such that every 
node m' € p — {m,n} is post- dominated by n, and (2) m is not post-dominated 
by n. 

For a node n to be control-dependent on m, m must have at least two immediate 
successors in GFG {i.e., m must be a conditional), and there must be two paths 
that connect m with e such that one contains n and the other does not. For 
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example, in Figure 2, [prod-wakeup.l] is control-dependent on [prod-put. 2], 
but [prod-release-lock. 1] is not (since it post-dominates [prod-put. 2[). 

We noted above that in existing work on slicing, the goal is to construct slices 
that preserve the (projection) semantics of terminating program executions. To 
preserve semantics of non-terminating executions, one needs to make sure that 
the slice includes any program points lying alongs paths to relevant nodes that 
could cause non-termination. 

Definition 8 (divergence dependence). Node n is divergence-dependent on 

node m (written n ^ m) if (1) m is a pre-divergence point, and (2) there exists 
a non-trivial path p from m to n such that no node m' € p — {m, n} is a pre- 
divergence point. 

Note that this definition will allow slicing to remove infinite loops that cannot 
infinitely delay the execution of a relevant node. For the producer thread in 
Figure 2, the pre-divergence points are [prod_check.l] and [prod_cycle.2]. Since 
all the nodes in the producer thread are reachable from these two nodes, all nodes 
in the thread are divergence-dependent on them.^ 

We now consider dependencies that arise due to concurrent execution in 
CFCL. Interference dependence [1] captures the situation where definitions to 
shared variables can “reach” across threads. 

Definition 9 (interference dependence). A node n is interference-dependent 

on node m (written n ^ m), if 9{n) yf d(m), and there is a variable v, such 
that v € def{m) and v € ref{n). 

For example, in Figure 2, references of count at [prod_check.l], [prod_put.l], 
and [prod_put.2], are all interference dependent on the definition of count at 
[cons_get.l]. 

If a relevant variable is defined at node n inside of some critical region, then 
the locking associated with that region must be preserved {i.e., the corresponding 
enter- monitor and exit-monitor commands must appear in the slice). Omit- 
ting the monitor might allow shared variable interference that was not present 
in the original program. In this situation, we say that n is synchronization- 
dependent on the inner-most enclosing enter-monitor and exit-monitor nodes 
(dependence in the case of nested critical regions will be captured by the tran- 
sitive closure of this relation). 

In Figure 2, if the variable count is a relevant variable for a slicing criterion 
C, then the buffer monitors are relevant as well — thus, preventing concurrent 
increments and decrements of count from interfering with each other. 

Definition 10 (synchronization dependence). A node n is synchronization- 
dependent on node m (written n ^ m), if CR{n) = {mi, m 2 ) and m G 
{mi, m 2 }. 

^ One might imagine a conservative analysis that can distinguish simple cases where 
loops are guaranteed to converge, and thus eliminate exit conditionals of these loops 
from the set of pre-divergence points. 
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Just as we introduced divergence dependence to capture the situation where 
an infinite loop may prevent the execution of some observable node, we now 
introduce the notion of ready dependence. Informally, a statement n is ready- 
dependent on a statement m if m’s failure to complete (either because it is never 
reached or because, for wait, the notify never occurs) can make 6{n) block 
before reaching or completing n — thus delaying n’s execution indefinitely. The 
following definition for ready dependence is fairly conservative and we will discuss 
in the following section how it can be refined based on the notion of safe lock. 

Definition 11 (ready dependence). A node n is ready-dependent on node 
m (written n ^ m), if 

1. 9{n) = 9{m) and n is reachable from m in the CFG of 9{m) and code{m) = 

enter-monitor k, or 

2. 9{n) yf 9{m) and code{n) = enter-monitorfc and code{m) = exit-monitor/c, 
or 

3. 9{n) = 9{m) and n is reachable from m and code{m) = wait k, or 

4 . 9{n) yf 9{m) and code{n) = wait A: and codefm) G {notify k , notify-all k}. 

Some of the ready dependences in Figure 2 are as follows: all nodes in the 
producer thread are reachable from the wait at [prod_wait.l], so all nodes in 
the thread are ready-dependent on it (by condition 3); all nodes in the thread 
are ready-dependent on the enter-monitor tally at line [prod_enter_tally.l] 
(by condition 1); the wait buffer at [prod_wait.l] is ready-dependent on the 
notify buffer at [cons_wakeup.l] (by condition 4). 

Given a program p, define the relation -4 with respect to p to be the union 
of the relations defined above with respect to p (z.e., the union of 

^). The PDG P for p consists of the nodes of the GFG G for p with edges 
formed by the relation - 4 . 

In the PDG approach to slicing, constructing the program slice proceeds by 
finding the set of nodes Sc (called the slice set) in p's GFG that are reachable 
(via - 4 ) from the nodes in C. 

Definition 12 (slice set). Let C be a slicing criterion for program p and P be 
the PDG for p. Then the slice set Sc ofp with respect to C is defined as follows: 

d 

Sc = {m \ n € C and n — >■* m}. 

In addition to the nodes in Sc, the residual program must contain other nodes 
(such as goto’s and matching enter- monitor /exit-monitor commands) to be 
we 11- formed. Given C and the slice set Sc, we briefly sketch how the residual pro- 
gram is constructed (for details see extended version of the paper) . If an assign- 
ment or synchronization statement is in Sc, then it must appear in the residual 
program. If an enter-monitor or its matching exit-monitor appears in Sc, 
then it must appear in the residual program (and vice versa for exit- monitor). 
All goto and return jumps must appear in the residual program. However, if 
an if is not in Sc, then no node in Sc is control dependent upon it. Therefore, 
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it doesn’t matter if execution follows the true branch or the false branch. In 
this case, one can replace the conditional with a jump to the point where the 
two branches merge. All other source statements can be omitted from residual 
program. This process yields a residual program p that contains “trivial” blocks 
(blocks containing no assignments and a goto ^ Sc as the jump). Unshared 
trivial blocks {i.e., those that are not the target of two or more jumps) can be 
removed in a simple post-processing phase. 

As an example, consider slicing the program of Figure 2 using the criterion 
C = {[prod_cycle.l]}. Since the threads of the example are tightly coupled, the 
slicing algorithm correctly identifies that all statements except the assignments 
to total and cons must appear in the residual program. In general, ready depen- 
dences on wait statements end up causing a large amount of the synchronization 
structure to be included in the slice. Intuitively, this is because it is impossible 
in general to tell staticly if the conditions for a thread to be “notified” will 
eventually occur. However, one might expect that monitors that contain only 
notify commands and/or assignments to irrelevant variables (such as the tally 
monitor in the example) could be removed. In fact, the tally monitor can be 
removed using the refinements based on safe locks presented below. Of course, 
monitors containing wait can also be removed in certain situations where there 
are no relevant nodes reachable from them. 

Theorem 1 (correctness of slicing) . Let C by a slicing criteria for program 
p. Let Ps he the residual program constructed wrt p and C according to process 
outlined above. Then program Ps is a slice of p, i.e., the initial configurations of 
p and Ps are C -bisimilar. 

We have identified several common programming idioms in which ready de- 
pendence can cause an overly-large slice to be produced. We now describe how 
relatively inexpensive static analyses over the program flow graph can be ap- 
plied to refine ready dependences and consequently reduce the size of a slice 
while preserving its correctness. 

The example of Figure 2 contains a common coding pattern for program 
threads: the threads have a main loop with regions in the loop body protected 
by locks. Consider that the loop back-edge in such a thread t will force each node 
in the loop body to be ready-dependent on each enter-monitor command in the 
body. Each occurrence of enter-monitor k for some lock k is ready-dependent 
on exit-monitor k commands in the other program threads t' yf t. The re- 
sultant dependencies will force all /c-monitor related commands in all threads 
to be included in the slice (however, if k' does not occur in t, /c'-monitor com- 
mands in threads other than t might be removed). Recall that ready depen- 
dence is implying that one of these enter-monitor commands may cause the 
other fc-monitor commands to be infinitely delayed. Since locks in most threaded 
programs will not be held indefinitely, the presence of these dependencies will 
usually be unnecessary (since we assume the JVM implementation guarantees 
scheduling fairness). 

We define a safe lock to be one which will never be held indefinitely. We 
do not attempt to compute the exact set of safe locks, rather we focus on an 
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easily identified subset of the safe locks. A lock is safe if all paths between 
matching enter-monitor fc/exit-monitor k commands contain (1) no wait- 
free loops, (2) no wait commands for other locks, and (3) no enter-monitor 
or exit-monitor commands on unsafe locks. A wait-free loop is a loop which 
has some path through its body that does not include a wait command. Lock 
safety can be computed by a depth-first marking of the program flow graph 
fragments rooted at enter-monitor commands; this results in a boolean lock 
function safe{k). 

For example, both the locks in Figure 2 satisfy the safety conditions above. 
The count lock will eventually be released at monitor exits, or at the wait on 
each iteration of the inner loop {e.g., the loop with header prod_check). It is 
easy to see that the tally-monitors in both the producer and consumer are 
guaranteed to release the lock because each contains only a single assignment. 

Using lock safety information, we can refine Definition 11 by stating the ready 
dependencies only arise from conditions (1) and (2) in Definition 11 if lock k is 
unsafe (z. e., ->safe{k)). Slicing the example program of Figure 2 on criterion 
{[prod_cycle.l]} now removes the tally monitors in both the producer and 
the consumer since (a) [prod_cycle.l] no longer ready-depends on the producer 
tally-monitor nor does it depend on any statement within the tally-monitor. 
On the other hand, the buffer- monitors in both the producer and consumer 
are still included in the slice since [prod_cycle.l] is ready-dependent on the 
wait buffer command (which will end up cause the entire buffer synchroniza- 
tion structure from both threads to be included). In Figure 2, the * annotations 
indicate the statements that are removed from the program when slicing on 
{[prod-cycle.l]\ using the safe-lock refinement (note that post-processing will 
retarget the goto’s in prod_release_lock and cons_release_lock to jump to 
prod_cycle and cons_cycle respectively). 

Formally speaking, individual thread actions that correspond to blocking on 
an enter-monitor command are unobservable. The only way that such block- 
ing can influence the observable behavior of a slice is if it is indefinite. Ready 
dependence for unsafe locks preserves such indefinite blocking in the slice. For 
safe locks, finite-duration blocking is equivalent to a finite-sequence of unobserv- 
able moves in the blocked thread, which is bisimilar to a system with no such 
unobservable moves. 



5 Related Work 

Static slicing of concurrent programs: Cheng [1] presents an approach to 
static and dynamic slicing of concurrent programs based on a generalization of 
PDG’s which he calls program dependence nets (PDN). PDN’s include edges for 
what Cheng calls synchronization dependence, communication dependence and 
selection dependence. His synchronization dependence roughly corresponds to 
the synchronization dependence in our work. Interprocess communication is his 
setting is channel-based, and his notion of communication dependence is analo- 
gous to interference dependence. His selection dependence is a generalization of 
control dependence to the non-deterministic choice operators appearing in his 
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language. As Tip notes [12], Cheng does not state or prove any property of the 
slices computed by his algorithm. 

Zhao [14] addresses static slicing of Java programs using thread depedence 
graphs (TDGs). Although TDGs are able to handle method calls, they do not 
incorporate notions such divergence dependence nor what we call synchroniza- 
tion dependence (synchronized methods are handled, but not synchronized state- 
ments). Of our ready dependencies, only dependencies associated with compo- 
nent (4) are handled by TDGs, and they do not incorporate the notion of safe 
lock. Finally, there is no notion of correctness defined. 

Krinke [7] considers static slicing of multi-threaded programs with shared 
variables, and focuses on issues associated with interference dependence. He 
notes that considering interference dependence to be transitive (as we have done) 
can result in slice that can be viewed as overly-imprecise. Krinke uses the notion 
of a witness (a sequence of configurations that can be extracted from a valid ex- 
ecution trace of the program) to filter out situations where transitive inference 
dependencies give imprecise results. The filtering has worst-case exponential be- 
havior in the number of transitive dependence edges. While this approach to 
more precise slices is intriguing, it is unclear to us if it has significant practical 
benefits which would outweigh the extra conceptual overhead in the description 
of slicing as well as the associated computational cost. 

In Krinke’s language (which includes threads with fork/join operations), 
there is no explicit synchronization mechanism, so he does not include notions 
analogous to synchronization dependence or ready dependence. Also, he does 
not consider the notion of divergence dependence, and does not state or prove 
any property of the slices computed by his algorithm. 

Slicing as a tool for model construction: Millett and Teitelbaum [6] 
study static slicing of Promela (the model description language for the model- 
checker SPIN [5] and its application to model checking, simulation, and protocol 
understanding, based on the extension and modification of Gheng’s work. They 
emphasize that even imprecise slices can give useful reductions for model check- 
ing. However, they do not formalize the semantics of concurrency in Promela 
nor their slicing methods. 

Glarke et al.[2] present a tool for slicing VHDL programs with dependence 
graphs. Their PDG approach is based on data and control dependence along 
with a new notion called signal dependence, which is similar to synchronization 
dependence. Their sheer integrates other transformations based on properties 
that appear in hardware design languages. They briefly discuss relationships 
between slicing and model-checking optimizations. 

6 Conclusions and Future Work 

We have constructed a prototype implementation that processes a subset of the 
Jimple intermediate language using the ideas presented in this paper. Based 
on preliminary experiments with real Java systems, it appears that the safe- 
lock refinements presented in Section 4 are crucial for slicing away significant 
amounts of synchronization activity [4]. For example, in one real Java application 
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which consisted of five threads (organized into a pipeline topology), slicing the 
system directed by an LTL formula that specifies appropriate shutdown of one 
thread yielded a residual program with the other threads removed. However, the 
unrefined slicing algorithm that did not take safe locks into account was only able 
to slice away a very small portion of the code, because they threads were coupled 
through wait-notify related ready dependences. Although the examples that we 
have examined contain many of the common idioms of concurrent Java systems, 
more experiments across a broad range of concurrent software architectures are 
needed to confirm the effectiveness of this approach. 

We are scaling up our previous formal study of modal-logic-directed slicing 
[3] to the language features of the present work. Obviously, more work is needed 
to treat features of Java that we have not addressed. 
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Abstract. This paper presents and evaluates a set of analyses designed to 
reduce synchronization overhead in Java programs. Monitor-based 
synchronization in Java often causes significant overhead, accounting for 
5-10% of total execution time in our benchmark applications. To reduce this 
overhead, programmers often try to eliminate unnecessary lock operations by 
hand. Such manual optimizations are tedious, error-prone, and often result in 
poorly structured and less reusable programs. Our approach replaces manual 
optimizations with static analyses that automatically find and remove 
unnecessary synchronization from Java programs. These analyses optimize 
cases where a monitor is entered multiple times by a single thread, where one 
monitor is nested within another, and where a monitor is accessible by only one 
thread. A partial implementation of our analyses eliminates up to 70% of 
synchronization overhead and improves running time by up to 5% for several 
already hand-optimized benchmarks. Thus, our automated analyses have the 
potential to significantly improve the performance of Java applications while 
enabling programmers to design simpler and more reusable multithreaded code. 



1. Introduction 



Monitors [LR80] are appealing constructs for synchronization because they promote 
reusable code and present a simple model to the programmer. Many modern programming 
languages, such as Java [GJS96] and Modula-3, directly support monitors. While these 
constructs enable programmers to easily write multithreaded programs and reusable 
components, they can incur significant run time overhead. Reusable code modules may contain 
synchronization for the most general case of concurrent access, even though particular 
programs often use these modules in a context that is already protected from concurrency. For 
instance, a synchronized data structure may be accessed by only one thread at run time, or 
access to a synchronized data structure may be protected by another monitor in the program. In 
both cases, unnecessary synchronization increases execution overhead. As described in section 
2, even singlethreaded Java programs typically spend 5-10% of their execution time on 
unnecessary synchronization operations. 

Synchronization overhead can be reduced by manually restructuring programs [SNR+97], 
but this typically involves trading off program performance against simplicity, maintainability, 
and reusability. To improve performance, synchronization annotations can be omitted where 
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they are not needed for correctness in the current version of the program, or synchronized 
methods can be modified to provide specialized, fast entry points for threads that already hold a 
monitor lock. Such specialized functions make the program more complex, and using them 
safely may require careful reasoning about object-oriented dispatch to ensure that the protecting 
lock is acquired on all paths to the function call. The assumption that a lock is held at a 
particular program point may be unintentionally violated by a change in some other part of the 
program, making program evolution and maintenance error-prone. Hand optimizations make 
code less reusable, because they make assumptions about synchronization that may not be valid 
when a component is reused in another setting. In general, complex manual optimizations 
make programs harder to understand, make program evolution more difficult, reduce the 
reusability of components, and create an opportunity for subtle concurrency bugs to arise. 

In this paper, we present and evaluate static analyses that reduce synchronization overhead 
by automatically detecting and removing unnecessary synchronization. A synchronization 
operation is unnecessary if there can be no contention between threads for the synchronization 
operation. For example, if a monitor is only accessible by a single thread throughout the 
lifetime of the program, there can be no contention for the monitor, and thus all operations on 
that monitor can safely be eliminated. Similarly, if threads always acquire one monitor and 
hold it while acquiring another monitor, there can be no contention for the second monitor, and 
this unnecessary synchronization can safely be removed. Finally, when a monitor is acquired 
by the same thread multiple times in a nested fashion, the first monitor acquisition protects the 
others from contention and therefore all nested synchronization operations may be optimized 
away. In order to reason statically about synchronization, we assume the compiler has 
knowledge of the whole program at analysis time; future work may extend our techniques to 
handle Java’s dynamic code loading and reflection features. 

There are three main contributions of this paper. First, we describe several synchronization 
optimization opportunities and measure their frequency of occurrence in several Java programs. 
Second, we provide precise definitions for a family of analyses designed to detect unnecessary 
synchronization. Finally, we present a preliminary empirical evaluation of these analyses on a 
suite of benchmarks. Our partial implementation eliminates up to 70% of synchronization 
overhead and improves running time by up to 5% for typical Java benchmarks on a highly 
optimized platform. 

The rest of the paper is structured as follows. The next section describes the Java 
synchronization model, and provides measurements of synchronization overhead for typical 
benchmarks. Section 3 identifies opportunities for optimizations. Section 4 provides a precise 
description for a set of analyses that detect and eliminate unnecessary synchronization 
operations. Section 5 summarizes the performance impact of these analyses on a set of 
benchmarks, section 6 discusses related work, and section 7 concludes. 



2. Java Synchronization 



Java provides a monitor construct to protect access to shared data structures in a multithreaded 
environment. 



2.1 Semantics 

The semantics of monitors in Java are derived from Mesa [GMS77]. Each object is implicitly 
associated with a monitor, and any method can be marked synchronized. When executing 




Static Analyses for Eliminating Unnecessary Synchronization from Java Programs 21 



H 

G 

O 



X 

PJ 




BJDKI.2.0 
□ Vortex 




jlex javacup javac pizza cassowary 



Fig. 1. Overhead of Synchronization 



a synchronized method, a thread acquires the monitor associated with the receiver object,' 
runs the method’s code, and then releases the monitor. An explicit synchronization statement 
provides a way to manipulate monitors at program points other than method invocations. Java’s 
monitors are reentrant, meaning that a single thread can acquire a monitor multiple times in a 
nested fashion. A reentrant monitor is only released when the thread exits the outermost 
method or statement that synchronizes on that monitor. 



2.2 Cost 

Synchronization represents a significant performance bottleneck for a set of Java benchmarks. 
To quantify the cost of synchronization operations, we compared singlethreaded Java programs 
to versions of the same programs where synchronization has been removed from both the 
application and the standard Java library. Since the correctness of multithreaded benchmarks 
depends on the presence of synchronization, we did not perform these measurements on 
multithreaded benchmarks. However, the unnecessary synchronization present in 
singlethreaded programs suggests that a significant amount of the synchronization in 
multithreaded programs is also unnecessary. 

We used a binary rewriter [SGA+98] to eliminate all synchronization operations from the 
application binaries. This strategy allowed us to perform measurements on commercial Java 
virtual machines without having to instrument and recompile them at the source level. 

We examine the benchmarks using two different Java implementations that are 
representative of different Java virtual machine implementations. The JDK 1.2.0 embodies a 
hybrid JIT compilation and interpretation scheme, and features an efficient implementation of 
lock operations. Consequently, it represents the state of the art in commercially available Java 
virtual machines. Vortex, an aggressively optimizing research compiler [DDG+96], produces 
natively compiled stand-alone executables and uses efficient synchronization primitives 
[BKM+98]. For these figures, we use the base Vortex system, which does not contain the 
analyses described in this paper. 

Figure 1 shows the percentage of total execution time spent on synchronization in five 
singlethreaded benchmarks for each platform. Synchronization overhead averages 5-10% of 



'static synchronized methods acquire the monitor associated with the Class object for 
the enclosing class. 
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class Reentrant { 

synchronized fooO { 
this . bar ( ) 

} 

synchronized bar ( ) 

{ ■■■ } 

} 



class Enclosing { 
Enclosed member; 
synchronized fooO { 
member . bar ( ) ; 

} 

} 

class Enclosed { 

synchronized bar() 

{ ■■■ } 

} 



Fig. 2. Reentrant Monitors 



Fig. 3. Enclosed Monitors 



execution time, depending on the platform, and can be as high as 35%. The relative 7ost of 
synchronization varies between the platforms because of the varying amounts of optimization 
they perform in the compilation process, and their different synchronization implementations. 
For example, if Vortex is able to optimize the non-synchronization-related parts of a 
benchmark like jlex more effectively than the JDK 1.2.0, its synchronization overhead will be 
relatively more significant. In contrast, the benchmarks javac and cassowary may use 
synchronization in a way that is more expensive on the JDK platform than on Vortex. Despite 
the variations between platforms, synchronization overhead represents a significant portion of 
the execution time for these Java benchmarks, demonstrating that there is considerable room 
for performance improvement over current synchronization technology. 



3. Optimization Opportunities 



In this section, we describe three different opportunities for optimizing synchronization 
operations. 



3.1 Reentrant Monitors 

Reentrant monitors present the simplest form of unnecessary synchronization. As illustrated in 
Figure 2, a monitor is reentrant when one synchronized method calls another with the same 
receiver object. It is safe to remove synchronization from bar if all calls to bar reachable 
during program execution are within procedures that synchronize on the same receiver object. 
Our optimization generalizes this example to arbitrary call paths: synchronization on the 
receiver object O of method bar may be removed if along every reachable path in the call 
graph to bar there is a method or statement synchronized on the same object O. 

If the receiver object’s monitor has been entered along some, but not all, call paths to 
method bar, specialization can be used to create two versions of bar: an unsynchronized 
version for the call paths where the receiver is already synchronized, and a synchronized 
version for the other call paths. The synchronized version acquires the lock and then simply 
calls the unsynchronized version. For example, if bar is also called from the function main, 
where the receiver object is not synchronized, bar could be specialized so that main calls a 
synchronized version that acquires a monitor. Methods like foo that have already locked the 
receiver object can still call the more efficient, unsynchronized version of bar. 
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class PrintWriter { 

Object lock; 

Writer out; 

void write (int c) { 

synchronized (lock) { 
out . write (c) ; 

} 

} 

} 

class StringWriter { 

synchronized write (int c) 

{ ■■■ } 

1 



PrintWriter 




Fig. 4. Immutable Paths 



3.2 Enclosed Monitors 

An enclosed monitor is a monitor that is already protected from concurrent access by another 
monitor. The enclosing monitor is always entered first, and while it is held the enclosed 
monitor is acquired. Later, the enclosed monitor and then the enclosing monitor will be 
released. Because the enclosed monitor is only entered when the enclosing monitor is held, it is 
protected from concurrent access and is unnecessary. For example, in Figure 3 the monitor on 
the member object is enclosed by the monitor on the Enclosing object. Thus the 
synchronization on the bar function is unnecessary and may be removed. 

In order to remove synchronization safely from a monitor M during static analysis, we must 
prove there is a unique, unchanging enclosing monitor that protects M, not one of several 
enclosing monitors. If there were several Enclosing objects in Figure 3, for example, 
different threads could access the Enclosed object concurrently by going through different 
Enclosing objects, and it would be unsafe to remove synchronization from bar. There are 
four ways we can ensure this is the case: 

First, the enclosing monitor may store the enclosed monitor in an unshared field — a field 
that holds the only reference to the enclosed object. Since the unshared field holds the only 
reference to the enclosed object, the only way to enter the enclosed object's monitor is to go 
through the (unique) enclosing object. We can relax the "only reference" condition in the 
definition of an unshared field if we use the name of the field to identify the enclosing lock. As 
long each enclosed object is only stored in one instance (i.e., run-time occurrence) of that field, 
it is permissible for other fields and local variables to refer to the enclosed object, because the 
field name uniquely identifies the enclosing object. 

Second, the enclosing monitor may be stored in an immutable static field, i.e. a global 
variable that does not change value. Because the enclosing monitor is identified by the static 
field, and only one object is ever stored in that static field, the field name uniquely identifies a 
monitor. The static field's monitor M encloses another monitor M’ if all synchronization 
operations on M’ execute from within monitor M. 

Third, the enclosing monitor may be stored in an immutable field of the enclosed monitor. 
Since an immutable field cannot change, the same enclosing monitor is always entered before 
the enclosed monitor. This case occurs when a method first synchronizes on a field of the 
receiver object, then on the receiver object itself 
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class Local { 

synchronized fooO 

{ ■■■ } 

} 

mainO { 

new Local ( ) . f oo ( ) ; 

} 



Fig. 5. Thread-Local Monitors 
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Fig. 6. Optimization Potential 



Fourth, the cases above can be combined. For example, Figure 4 illustrates an example 
similar to cases in the JDK 1.2.0 I/O library when an stream object first synchronizes on an 
object in one of its fields, then calls a synchronized method on the object in another field. In 
the example, it is safe to remove the synchronization on StringWriter . write because the 
lock object of an enclosing stream is always locked before calling write. Since lock is an 
immutable field of PrintWriter and out is an unshared field of PrintWriter, we can 
use transitivity to determine that there is a unique enclosing object (lock) for each enclosed 
object (out). Using transitivity, we can combine a sequence of immutable and unshared fields 
into a unique path from the enclosed monitor to the enclosing monitor. A unique path 
identifies a unique enclosing object relative to a particular enclosed object. 

The general rule we have developed can be stated as follows: 

A synchronization statement S may be removed if, for every other synchronization 
statement S’ that could synchronize on the same object as S, there exists an unique path 
of links such that: 

1 . The first link represents the object synchronized on by S and S ’ 

2. Each subsequent link is either an unshared field of an object that encloses the 
link before or an immutable field that is enclosed by the link before 

3. The last link represents an object that is synchronized on all call paths that 
reach S and is also synchronized on all call paths that reach S’ 

As in the case of reentrant monitors, synchronization statements on enclosed objects may be 
specialized if it is legal to remove synchronization on some instances of a class but not others. 
For example, the root node in a binary tree encloses all of the inner nodes, so specialization 
could create two kinds of nodes: one that is synchronized for creating the root of a binary tree, 
and one that is unsynchronized for creating the inner nodes of the tree. 



3.3 Thread-Local Monitors 

Figure 5 shows an example of a thread-local monitor. Instances of the Local class are only 
accessible by the thread that created them, because they are created on the stack and are not 
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accessible via any static field. Since static fields are the only base case for sharing data 
between threads in Java’s memory model, it is safe to remove synchronization on methods of 
any class that is unreachable from static fields. In our model, Thread and its subclasses are 
stored in a global list, so that passing references from one thread to another during thread 
creation is handled correctly. Specialization can eliminate synchronization when some 
instances of a class are thread-local and other instances are not. 



3.4 Optimization Potential 

Figure 6 shows an estimate of the opportunities for optimization in our benchmark suite, 
demonstrating that different programs present different optimization opportunities. This data 
was collected from dynamic traces of the five Java programs running on the JDK 1.1.6. For 
each benchmark, it shows the percentages of dynamic monitor operations that were reentrant, 
enclosed (by a different monitor), and thread-local, representing an upper bound for how well 
our analyses could perform. The bars may add up to more than 100% because some 
synchronization operations may fall into several different categories. All the benchmarks do 
100% of their synchronization on thread-local monitors because they are singlethreaded, and so 
no monitor is ever locked by more than one thread. Multithreaded benchmarks would have 
some synchronization that is not thread-local, but we believe that thread-local monitors would 
still represent a significant opportunity in these benchmarks. 

The benchmarks differ significantly in the optimization opportunities they present. For 
example, 41% of the synchronization in jlex is reentrant but less than 1% is enclosed. In 
contrast, 97% of the synchronization in javac is enclosed and virtually none is reentrant. For 
these singlethreaded benchmarks, thread-local monitors present the greatest opportunity for 
optimization, with two programs gaining significant benefit from enclosing or reentrant 
monitors. This data demonstrates that each kind of optimization is important for some Java 
programs. 



4. Analyses 



We define a simplified analysis language and describe three analyses necessary to optimize the 
synchronization opportunities discussed above: lock analysis, unshared field analysis, and 
multithreaded object analysis. Lock analysis computes a description of the monitors held at 
each synchronization point so that reentrant locks and enclosed locks can be eliminated. 
Unshared field analysis identifies unshared fields so that lock analysis can safely identify 
enclosed locks. Finally, multithreaded object analysis identifies which objects may be 
accessible by more than one thread. This enables the elimination of all synchronization on 
objects that are not multithreaded. Our analyses can rely on Java’s final annotation to detect 
immutable fields; an important area of future work is to detect immutable fields that are not 
explicitly annotated as final. 



4.1 Analysis Language 

We describe our analyses in terms of a simple expression-based core language, incorporating 
the essential synchronization-related aspects of Java. This allows us to focus on the details 
relevant to specifying the analyses while avoiding some of the complexity of a real language. It 
is straightforward to handle the missing features of Java — our prototype implementation 
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id, field, fn ID 
label LABEL 
key KEY 
e, program E 
E ; : = 

I ID 

I let ID : = El in E2 
I E.ID 

I El. ID := E2 
I El op E2 

I synchronized^®®’^ (Ei) { E2 } 
I if El then E2 else E3 
I ID(Ei, . . . ,En) 



Fig. 7. Core Analysis Language 



handles all of the Java language except reflection and d 5 mamic code loading, which are omitted 
to enable static reasoning. 

Figure 7 presents our analysis language. It is a simple, first-order language, incorporating 
object creation, field access and assignment, let-bound identifiers, synchronization expressions, 
and simple control flow. Each object creation point is labeled with a class key [GDD+97], 
which identifies the group of objects created at that point. In our implementation, there is a 
unique key for each new statement in the program; in other implementations a key could 
represent a class, or could represent another form of context sensitivity. We assume that all let- 
bound identifiers are given unique names. Static field references are modeled as references to a 
field of the special object global, which is implicitly passed to every procedure. We assume 
all procedures are put into an implicit global table before evaluating the main expression. The 
lookup function returns the -expression associated with a particular procedure. 

We model ordinary binary operators like + and ; (which evaluates and discards its first 
argument before returning the second) with the Ei op E 2 syntax. Control flow operations 
include simple function calls and a functional i f expression — facilities that can be combined 
to form other structures like loops and object-oriented dispatch. Finally, Java’s synchronization 
construct is modeled by a synchronized statement, which locks the object referred to by E^ 
and then evaluates E 2 before releasing the lock. Each synchronized statement in the 
program text is associated with a unique label LABEL that is used in our analyses. 



4.2 Analysis Context 

Our analyses are parameterized by other alias and class analyses, a feature of our approach that 
allows a tradeoff between analysis time and the precision of our analysis results. Our analyses 
also benefit from earlier copy propagation and must-alias analysis passes, which merge 
identifiers that point to the same object. We assume the following functions are defined from 
earlier analysis passes: 

id_aliases(e) - the set of identifiers that may point to the same value as 
expression e 

field_aliases( field) - the set of fields declarations whose instances may 
point to the same object as field. This information can be 
easily computed from a class analysis. 
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is_immutable( field) — true if field is immutable (i.e., write-once). 

This may be deduced from final annotations and constructor 
code. 

Iabel_aliases(lahel) - the set of labels of synchronization statements 
that may lock the same object as the synchronization statement 
associated with label 

Some of our analyses deal with groups of objects, represented by class keys. We assume 
that an earlier class pass has found a conservative approximation to the set of objects that can 
be in each variable or field in the program. Our implementation uses the 1-1-CFA algorithm 
[S88][GDD+97], which considers each procedure in one level of calling context and analyzes 
objects from different creation points separately, and the 0-CFA algorithm, which lacks this 
context-sensitivity. We use the following functions to access this information: 

field_keys(field, key) - the set of class keys to which field field may 
refer when accessed through a particular class key key 
static_field_keys(field) — the set of class keys to which static field 
field may refer 

label_keys(lahel) - the set of class keys that the synchronization 
expression associated with label may lock. 



4.3 Analyses 

Our analyses compute the following functions: 

get_locks(lahel) - the set of locks held at a particular synchronization 
point denoted by label. A lock is represented by a path of two 
kinds of field links, as described below. 
is_unshared(f ield) - true if f ield is unshared 

is_multithreaded(]<LeY) — true if objects described by key may be 
accessible through static variables 

We describe our first two analyses in syntax-directed form, where a semantic function maps 
an expression and a set of inherited attributes to a set of synthesized attributes. The third 
analysis uses only data from previous analyses and does not work directly over the program 
text. 

Lock Analysis. Figure 8 defines the domains and helper functions that are used by our lock 
analysis flow functions. Our lock analysis, shown in Figure 9, describes locks in terms of paths 
and bipaths. A path names a particular object relative to an identifier, and consists of the 
identifier name and a series of field accesses. Thus, the path id fieldi field 2 
represents the expression id. fieldi. field 2 . A bipath represents a bi-directional path. 
The forward links represent field dereferences, as in paths, while the backward links mean “is 
enclosed by” — that is, in a bipath of the form bipath field, the expression denoted by 
bipath^^^ is referenced by the field field of some unspecified object. In our descriptions, we 
use the notation m\x y] to denote that we compute a new mapping X Y that is identical 
to mapping m except that element x A is mapped to y Y. 
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path PATH = ID + PATH ID 
dir DIR = { , } 

bipath BIPATH = ID + BIPATH DIR ID 
lockset LOCKSET = 2®^™™ 
lockmap LOCKMAP = LABEL LOCKSET 
idmap IDMAP = ID 2^’™ 

is_immutable_path(path) : bool 
switch (path) 
case id : true 

ease path’ field : is_immutable(fie\d) is_immutable_path(path’) 

/s_prefix(bipath,, bipath^) : bool 
if (bipathj = bipath^) then true 
else if (bipath 2 = id) then false 

else if (bipathj= bipath’ dir field) then is_prefix(hipat\, bipath’) 

substitute(hipat\, path, bipath^) : B I PATH 
if (bipathj = path) then bipath^ 
else if (bipathj = bipath’ dir field) 

then substitute(hipath\ path, bipath^) dir field 
else error 

map_/oc/c(bipathj, path, bipath^) : B I PATH { not_defined } 

if (is_prefix(path, bipath,)) then Si7bsf/fufe(bipath,, path, bipath^) 
else if (path = path’ field is_unshared(field)) 
then map_/oc/c(bipath,, path’, bipath^ field) 
else not_defined 

map_lockset(\ockset, path,, path^) : LOCKSET 

{ map_/oc/c(bipath, path,, path^) | bipath lockset } - { not_defined } 



Fig. 8. Domains and Helper Functions for Lock Analysis 

The lock analysis function L accepts four arguments in curried style. The first argument is 
an expression from the text of the program. The second argument, a lockset, is the set of 
bipaths representing locks held at this program point. The third argument, a lockmap, is the 
current mapping from synchronization labels to sets of bipaths representing locks held at each 
labeled synchronization statement. The final argument, an idmap, is a mapping from identifiers 
to paths that describe the different field expressions that the identifier aliases. The result of 
lock analysis is a lockmap that summarizes the locks held at every reachable synchronization 
label in the program. We analyze the expression representing the program in the context of an 
empty lockset (no lock encloses the entire program expression), an optimistic lockmap (no 
synchronization points have been analyzed yet), and an empty idmap (no identifiers are in 
scope). 

Many of the analysis flow functions in Figure 9 are relatively straightforward; we discuss 
only the more subtle ones below. The rules for let and id expressions update the idmap for 
identifiers and return the pathset represented by an identifier. A field expression simply 
extends all paths in e’s pathset with field. 
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L : E LOCKSET LOCKMAP IDMAP 2'’™ LOCKMAP 

getjocks(lahel) : LOCKSET = 

let (pathsef, lockmap) = L|]program]|0 0 0 in lockmap'(label) 

i|^ew''®'']]lockset lockmap idmap = (0, lockmap) 

LJidJlockset lockmap idmap = ({ id } idmap(id), lockmap) 

Lp.et id := 6i in 62] lockset lockmap idmap = 

let (pathsef, lockmap') = LJei]| lockset lockmap idmap in 
^fle2]| lockset lockmap' idmap[id pathsef] 

LJe . field]] lockset lockmap idmap = 

let (pathsef, lockmap') = iHe] lockset lockmap idmap in 
({path field I path pathsef }, lockmap') 

ijei. field := e2]] lockset lockmap idmap = 

let (pathsef, lockmap') = i|Ie2]l lockset lockmap idmap in 
let (pathsef', lockmap") = LJeiJ lockset lockmap' idmap in 
(0, lockmap") 

ijei op e2]] lockset lockmap idmap = 

let (pathsef, lockmap') = Z,|]ei]] lockset lockmap idmap in 
let (pathsef', lockmap") = LJej]] lockset lockmap' idmap in 
(0, lockmap") 

LlJsynchronized^**’®^ (ei) { e2 }]] lockset lockmap idmap = 
let (pathsef, lockmap') = LHei] lockset lockmap idmap in 

let lockmap" = lockmap'[label map_lockset(\ockset, path, SYNCH)] in 

path pathsef’’ 

Z,Je2]] (lockset { path | path pathsef is_immutable_path(path ) }) lockmap" idmap 

LKif ei then e2 else e3]] lockset lockmap idmap = 
let (pathsef, lockmap') = LHei] lockset lockmap idmap in 
let (pathsef', lockmap") = LJes]] lockset lockmap' idmap in 
let (pathsef", lockmap'") = iHej] lockset lockmap" idmap in 
(pathsef' pathsef", lockmap'") 

i|fn (ei, . . . , Bn) ]] lockset lockmap„ idmap = 

let U (formali, . . . , formal^) ej= /oo/ciip(f n) in 
1 l..n let (pathset, lockmap,) = L|[ei]| lockset lockmap, j idmap in 

let lockset' = map_/oc/csef(lockset, path, forma li) in 

i \..n 

path pathset^ 

let (pathsef, lockmap') = context_strategy(L^eJ\ockset' lockmap_^0) in 
({ substitute(path, formal^, path') 

I path pathsef i l..n s.t. /'s_p/'e/7x(formali, path) path' pathset,}, 
lockmap') 



Fig. 9. Lock Analysis Flow Functions 
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When a synchronization statement is encountered, the lockmap is updated with all of the 
bipaths in the lockset. Before being added to the lockset, however, these bipaths are converted 
to a normal form in terms of ei, the expression on which the statement synchronizes. This 
normal form allows us to compare the bipath descriptions of the locks held at different 
synchronization points in the program in the lock elimination optimization described below. 

The normal form expresses a lock in terms of the special identifier SYNCH representing ei, 
the object being locked. The map_lockset function considers each bipath in the lockset in turn 
and uses map_lock to compute a new bipath in terms of a mapping from the pathset of ei to 
SYNCH. For each bipath b in the lockset, map_lock will substitute SYNCH into the lock 
expression bipath if the bipath is a prefix of b. For example, if the path corresponding to ei is 
id fields and the lockset is { id fieldi fieldj }, then map_lock(id 
fields field 2 , id fields, SYNCH) = SYNCH fieldj, signifying that the field 
f ieldj of the object referred to by is already locked at this point. 

If the prefix rule does not apply and the last field field in the synchronization expression 
path is unshared, then map_lock will try to match against a shorter prefix with SYNCH 
field as the expression to be substituted. In Figure 5, the synchronization expression 
PrintWriter out is not a prefix of the currently locked object PrintWriter 
lock, so since out is an unshared field map_lock will attempt to substitute SYNCH out 
for PrintWriter instead. Thus the result we get is map_/oc/c(PrintWriter lock, 
PrintWriter out, SYNCH) = SYNCH out lock. That is, at the current 
synchronization point the program holds a lock on the lock fieldof the object whose out field 
points to the object currently being synchronized. This is a correct description of the case in 
Figure 5. 

Next, the expression inside the synchronization block is evaluated in the context of the 
current lockset combined with all paths in the synchronization expression’s pathset that are 
unique. The is_immutable_path function, which checks that each field in a path is 
immutable, ensures that no lock description is added to the lockset unless it uniquely identifies 
the locked object in terms of the base identifier. 

At function calls, we look up the definition of the called function and evaluate the actual 
parameters to produce a set of paths for each parameter and an updated lockmap. The 
map_lockset function is used to map actual paths to formal variables in each lock in the 
lockset. Information about locks that are not related to formal parameters (including the 
implicit formal parameter global mentioned subsection 4.2) cannot be used by the callee, 
since there would be no way to ensure that the locked object protects any synchronization 
statements there. The callee is analyzed in the context of the new lockset and lockmap, and the 
result is memoized to avoid needless recomputation. 

Our analysis may be parameterized by a context_strategy function that allows a varying 
level of context sensitivity. The current implementation is context-insensitive — it simply 
computes the intersections of the incoming lockset with all other locksets and re-evaluates the 
callee in the context of the new lockset if the input information has changed since the last 
analysis. We avoid infinite recursion in our analysis by returning an empty pathset and the 
existing lockmap when a lock analysis flow function is called recursively with identical input 
analysis information; the analysis will automatically iterate until a sound fixpoint is reached. 
Since the lockset must decrease in size each time a function is reanalyzed, termination of our 
analysis is assured. Finally, the set of paths is returned from the function call by mapping back 
from formals to actuals. 
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fset, shared FSET = 

idstate IDSTATE = ID FSET 

U : E IDSTATE FSET FSET IDSTATE FSET 

is_ unshared( f i e 1 d) = let (idstate, shared) = 17 |jprogram] 00 in (field shared) 

C/IJaew''®''] idstate shared = (0, idstate, shared) 

[/[id] idstate shared = (idstate(id), idstate, shared) 

[/[let id := 6i in 62]] idstate shared = 

let (fset', idstate', shared') = f/Jei] idstate shared in 
C/l[e2]idstate'[id fset'] shared' 

C/|[e . field] idstate shared = 

let (fsef , idstate', shared') = [/Je] idstate shared in 
(f/e/o[_a//ases(f ield), idstate', shared') 

C/|[ei . field : = e2] idstate shared = 

let (fsef, idstate', shared') = f/Jei] idstate shared in 
let (fset", idstate", shared") = C/Jej] idstate' shared' in 
let fsef" = field fsef ' in 

(fsef", 

idstate"[id idstate"(id) fsef" | id id_aliases(B2)], 
if (fsef ' field) then shared" else shared" field) 

C/l[ei op e2] idstate shared = 

let (fsef, idstate', shared') = [/jjei] idstate shared in 
let (fset", idstate", shared") = C/Jes] idstate' shared' in 
(fsef merge_,p fset", idstate", shared") 

(/Ijsynchronized’''^*’®^ (ei) { e2 }] idstate shared = 
let (fsef, idstate', shared') = [/jjei] idstate shared in 
(TOe 2] idstate' shared' 

[/[if 6i then &2 else 63] idstate shared = 
let (fsef, idstate', shared') = f/Jei] idstate shared in 
let (fset", idstate", shared") = {TJesJ idstate' shared' in 
let (fsef", idstate'", shared'") = t/l[e2l idstate' shared' in 
(fset" fsef", idstate" idstate'", shared" shared'") 

C/pn (ei, . . . , Bn) ]idstate„ shared^ = 

let J (formali, . . . , formaln) e]= /oo/ct/p (fn) in 
i 1 . .n let (fset, idstate,, shared,) = f/Uei] idstate, , shared, , in 

let idstate' = { formali fset, | i l..n } in 

let (fset", idstate", shared") = confexf_sfrafepy([/|[e] idstate' sharedj in 
(fsef, 

idstate„[id idstate„(id) idstate"(formali) | i L.nandid /d_a//ases(ei)], 
shared") 



Fig. 10. Unshared Field Analysis 
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Unshared Field Analysis. The unshared field analysis described in Figure 10 computes the set 
of fields that are shared, i.e. may refer to objects that are also stored in other instances (that is, 
run-time occurrences) of the same field. Unshared fields are in the complement of this shared 
field set. The result of this analysis is used in the map_lock function of the previous analysis 
to detect enclosing locks. 

The information computed by unshared field analysis differs from the result of the 
field_aliases function in two essential ways. First, the field_aliases function cannot tell 
whether two instances of a given field declaration may point to the same object, which 
determines whether a given field is shared. Second, our unshared field analysis is flow- 
sensitive, enabling increased precision over non-fiow-sensitive techniques. 

The analysis works by keeping track of which fields each identifier and expression could 
alias. When a field is assigned a value that may have originated in another instance of the same 
field, the analysis marks the field shared. U, the analysis function for unshared field analysis, 
accepts as curried parameters a program expression, the set of currently shared fields, and a 
mapping from identifiers to the sets of fields whose contents they may point to. It then 
computes the set of fields the expression may alias and an updated set of shared fields. Our 
analysis is run on the program’s top-level expression, using an initially empty identifier 
mapping (since no identifiers are initially in scope) and initially optimistically assuming that all 
fields are unshared. The rules for field references, field assignment, and function calls are the 
most interesting. 

When a field field is dereferenced, the resulting expression may alias any field in 
field_aliases( field). At assignments to a field field, we must update the identifier 
mapping for any identifier that could alias the expression being assigned to field, since the 
values these identifiers point to could also be referenced by field due to the assignment. In 
fact, due to the actions of other threads, these identifiers could alias any field in 
field_aliases( field). For the purposes identifying unshared fields, however, we can 
optimistically assume that such aliasing does not occur when writing to a field. This enables 
our analysis to detect unshared fields even when the same object is written to two fields with 
different names. If this object is later copied from one field to another, the field written to will 
be correctly identified as shared because aliasing is accounted for when reading fields. If the 
expression being assigned may not alias the field being assigned, then the field being assigned 
may remain unshared; otherwise, it is added to the shared set. In expressions of the form 
op 02, the correct merge function for the expression’s field set depends on the operator. For 
example, the merge function for the ; operand simply returns the field set of its second 
argument. 

At a function call, we lookup the callee and evaluate all the argument expressions to get a 
set of fields for each of them as well as an updated identifier map and shared field set. The 
idstate for the callee consists of a mapping from its formal parameters to the field sets of each 
actual parameter expression. We then evaluate the callee in the context of the new idstate and 
the current shared set, and return the resulting field set and shared set. After evaluating the 
callee, it is also necessary to update the identifier state of the caller. Every id that may alias an 
actual expression could now reference any field that the formal parameter of the callee could 
reference after evaluating the callee. This update is necessary because some callee (possibly 
several levels down the call graph) may have assigned the parameter’s value to a field. 

Our context_strategy for this analysis is context sensitive, as we re-evaluate the callee for 
each different identifier mapping. In practice, context sensitivity enables results that are much 
more precise. For example, when a callee is called with a formal parameter aliased to field 
at one call site, we don’t want all other call sites to see that the formal may alias field after 
the call and thus conservatively assume that the callee assigned that formal to field. 
Termination is assured because the results of each analysis are memoized, and the size of the 
field sets is bounded by the number of fields in the program. Recursive functions are handled 
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multi = static _Jield_keys(f) u [J field 



Lkeys(f,k) 



f staticFields 



k multi 
f fields (k) 



Fig. 11. Multithreaded Object Analysis 



by optimistically returning the empty set of fields at recursive calls, and the analyses 
subsequently iterate until a sound fixpoint is reached. 

Multithreaded Object Analysis. We define multi, a set of class keys, as the smallest set 
satisfying the recursive equation shown in Figure 11. Then we define is_multithreaded as 
follows: 



is_multithreaded(key) = key multi 

Our implementation simply starts with class keys referenced by static fields, and for each 
class key it considers each field of that key and adds the keys that field may reference to the set 
multi. When the set reaches the least fixed point, the analysis is complete. The analysis must 
terminate because there is a finite number of class keys to consider. 



4.4 Applying the Results 

To apply the results of our analyses, we perform an optimization pass during code generation. 
At each statement of the form 

synchronized’^®*’®^ (e^) { 62 } 

we replace it with the statement ; 62 if any of the following conditions holds: 

1. SYNCH get_locks(\abd), OT 

2. label' tabet_atiases(lahel) . 
get_tocks(lahel) get_tocks(lahel') 0 , or 

3. key label_keys(label) . is_multithreaded(key) 

The first condition represents a reentrant monitor — if the monitor associated with expression ei 
is already locked, then SYNCH get_tocks(lahel). Here get_tocks(lahel) is defined (in 
Figure 9) to be the result of lock analysis at the program point identified by label. We can 
safely replace the synchronized expression with a sequence that evaluates the lock expression 
(for potential side effects) and then evaluates and returns the expression protected within the 
synchronization statement. The second condition represents the generalization to enclosed 
locks: a synchronization statement S may be eliminated if, for every other synchronization 
statement S’ that may lock the same object, some common lock is already held at both S and S'. 
The third condition removes synchronization statements that synchronize on an expression that 
refers only to non-multithreaded class keys. 

Due to the complicated semantics of monitors in Java, our optimizations may not conform to 
the Java specification on some multiprocessor systems. According to the Java language 
specification, “locking any lock conceptually flushes all variables from a thread's working 
memory, and unlocking any lock forces the writing out to main memory of all variables that the 
thread has assigned.” [GJS96] This implies, for example, that a legal Java program may pass 
data (in a timing-dependent manner) from one thread to another by having each thread 
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synchronize on a thread-local object. This kind of “covert channel” communication could be 
broken by our optimizations. An implementation that synchronizes the caches of a 
multiprocessor even when other parts of a synchronization operation have been removed would 
comply with the Java specification, for example. Our optimizations are always safe, however, 
in a Java-like language with a somewhat looser synchronization guarantee which could be 
informally stated as follows: if thread Tj writes to a variable V and then imlocks a lock and 
thread locks the same lock and reads variable V, then thread will read the value that Tj 
wrote. We believe that most well written multithreaded programs in Java use this model of 
synchronization. 



5. Results 



A preliminary performance evaluation shows that a subset of our analysis is able to eliminate 
30-70% of the synchronization overhead in several of our benchmarks. We have implemented 
prototype versions of reentrant lock analysis and multithreaded object analysis, and 
transformations that use the results of these analyses. Our implementation does not yet apply 
specialization to optimize different instances of an object or method separately. Although our 
results are preliminary, they demonstrate the promise of our approach. We plan to complete 
and evaluate a more robust and detailed implementation in the future, which will include 
unshared field analysis and enclosed lock analysis. 

We demonstrate the performance benefit of our analyses on the five singlethreaded 
benchmarks presented earlier. While these benchmarks could be optimized trivially by 
removing all synchronization, they are real programs and may be partly representative of how 
synchronization is used in multithreaded programs as well. Javac, javacup, jlex, and pizza are 
all compiler tools; cassowary is a constraint solver. We hope to evaluate our techniques on 
multithreaded programs in the future. 

Our prototype implementation is built on the Vortex compiler infrastructure [DDG+96] 
augmented with a simple, portable, non-preemptive, user-level threading package based on 
QuickThreads [K93]. We compiled all programs with a full suite of conventional 
optimizations, as well as interprocedural class analysis. For our small benchmarks, we used a 
1-1-CFA call graph construction algorithm [GDD+97]; this did not scale well to pizza, javac, 
and javacup, so we used a simpler 0-CFA analysis for these programs, possibly missing some 
optimization opportunities due to more conservative alias information. Our lock 
implementation is already highly optimized, using an efficient lock implementation [BKM+98]. 
We compiled two versions — one with and one without our synchronization optimizations. 
Both versions included all other Vortex optimizations. All our runtime overhead measurements 
come from the average of five runs on a SPARC ULTRA 2 machine with 512 MB of memory. 
We ran the benchmarked program once before the data were collected to eliminate cold cache 
startup effects. 

Table 1 shows statistics about how our analyses performed. The first two columns show the 
total number of classes in the program and the number identified as thread-local. Multithreaded 
object analysis identified a large fraction of classes as singlethreaded for the jlex, javacup, and 
cassowary benchmarks, but was less successful for javac or pizza. Since these benchmarks are 
singlethreaded, all their classes are thread-local. However, because our analysis assumes static 
field references make a class reachable by other threads, our analysis is only able to determine 
this for a subset of the classes in each program. 
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Table 1. Synchronization Analysis Statistics 



Benchmark 


classes 


total lock 
ops 


lock ops removed 


% overhead 
removed 


total 


thread-local 


reentrant 


thread-local 


total 


jlex 


56 


34 


27 


2 


8 


9 


67% 


pizza 


184 


0 


38 


6 


0 


6 


N/A 


javacup 


66 


28 


30 


2 


6 


7 


47% 


cassovvary 


57 


29 


32 


4 


12 


13 


27% 


javac 


194 


0 


68 


5 


0 


5 


0% 



The next four columns of Table 1 show the total (static) number of synchronization 
operations, the number removed by reentrant lock analysis, the number of thread-local 
operations removed, and the total number of operations removed. The total is less than the sum 
from the two analyses because some synchronization operations were removed by both 
analyses. As suggested by the class figures in the first two columns, multithreaded object 
analysis was more effective than reentrant lock analysis for jlex, javacup, and cassowary, while 
pizza and javac only benefited from reentrant lock analysis. In general, our analyses removed 
20-40% of the static synchronization operations in the program. 

The last column summarizes our runtime performance results. We present the speedup 
achieved by our optimizations as a percentage of the overhead of synchronization for Vortex. 
For jlex, javacup, and cassowary, we eliminated a significant percentage of the synchronization 
overhead, approaching 70% in the case of jlex. The absolute speedups ranged up to 5% in the 
case of jlex. Pizza did not have a significant overhead from synchronization, so no speedup 
was achievable. We also got no measurable speedup on javac. 

The speedup in the case of jlex is due the large number of stack operations performed by this 
benchmark, which our analysis optimized effectively. Multithreaded analysis discovered that 
all of the Stack objects were thread-local, and lock analysis was successful in removing some 
reentrant locks in the Stack code. Most of the remaining synchronization is on 
DataOutputStream and Buf feredOutputStream objects. Multithreaded object 
analysis determined that DataOutputStream was thread-local and that the most important 
instances of Buf feredOutputStream were thread-local, but because our implementation 
does not yet produce specialized code for instances of Buf feredOutputStream that are 
thread-local we were unable to take advantage of this knowledge. Implementing specialization 
would improve our optimization performance here. 

Over 99% of Javacup^ synchronization comes from manipulation of strings, bitsets, stacks, 
hashtables, and I/O streams. Multithreaded analysis was able to remove synchronization from 
every method of StringBuf f er, but was did not eliminate synchronization from other 
objects. Each of the other classes was reachable from a static variable, either in the Java library 
or in the javacup application code. 

To optimize this code effectively would require three additional elements. First, we need a 
scalable analysis that distinguishes program creation points so that one multithreaded 
Hashtable does not make all Hashtables multithreaded. Our current 1-1-CFA analysis 
that distinguishes creation points does not scale to javacup or javac, and therefore our 
performance suffers for both benchmarks. Second, we need specialization to optimize different 
instances of the same class separately. Third, we need a more effective multithreaded analysis 
that can determine if a static variable is only used by one thread, rather than conservatively 
assuming all such variables are multithreaded. 
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In Cassowary, multithreaded analysis was able to remove synchronization from all the 
methods of Vector. However, the primary source of synchronization overhead was 
Hashtable, which was not optimized by our multithreaded analysis because it was reachable 
from static fields. 

Although a few operations were optimized in javac, we did not measure any speedup in this 
benchmark. Since javac executes many operations on enclosed monitors, we expect these 
results to improve once we have implemented our unshared field analysis and enclosed lock 
analysis. 

Considering that we have achieved a significant fraction of the potential speedup for several 
of our benchmarks although many important elements of our analyses are not yet implemented, 
we find these results promising. 



6. Related Work 



A large body of work (e.g., [ALL89] [KP98]) has focused on reducing the overhead of locking 
or synchronization operations. Most recently. Bacon’s Thin Locks [BKM+98] reduce the 
overhead of Java’s synchronization to a few instructions in the common case. Thin locks 
improve the performance of real programs by up to 70% by reducing the latency of individual 
synchronization operations. Our analyses complement this work by reducing the number of 
synchronization operations. 

Diniz and Rinard [DR98] present two techniques for lock coarsening in parallelizing 
compilers: merging multiple locks into one, so that several objects are protected by one lock, 
and transforming locks that are repeatedly acquired and released within a method so that they 
are only acquired and released once. Their work is applicable to explicitly parallel programs; 
however, they do not evaluate their optimizations in this context. They do not consider thread- 
local locks, do not consider immutable fields as a potential source of lock nesting, and 
apparently can only optimize nested locks in languages like C++ where objects can be statically 
declared to be represented inline. Their coarsening optimizations are complementary to our 
work; while we can eliminate a broader class of redundant locks, their optimizations may lead 
to acquiring the non-redundant locks fewer times. 

Another source of related work is the Concert project at the University of Illinois. To 
reduce the overhead of lock operations, they optimize calls from one method to another on the 
same receiver by eliminating the lock operation from the second method during inlining 
[PZC95]. They also do a lock coarsening optimization similar to that in [DR98]. Our research 
extends and generalizes their results by optimizing enclosing locks and thread-local objects. 

Our concept of an unshared field is similar to idea of a unique pointer [M96] or unique 
aliasing mode [H91][NVP98]. Unlike the previous work, we find unshared fields 
automatically, rather than requiring annotations from the programmer. Our unshared field 
analysis is similar to an analysis used by Dolby to inline object fields [D97]. In order to safely 
inline a field, his system propagates tags to determine which fields could alias particular 
variables. The precision of his analysis is identical to ours given a similar analysis framework, 
but his work requires more strict conditions to inline a field than ours requires to identify an 
unshared field. 

Work from the model-checking community [C98] performs shape analyses similar to ours in 
order to simplify models of concurrent systems. These analyses remove recursive locks and 
locks on thread-local objects from formal models. This allows a model checker to more easily 
reason about the concurrency properties of a Java program. An analysis similar to enclosing 
lock analysis is also performed, not to eliminate enclosed locks, but to reason about which 
objects might be subject to concurrent (unprotected) access. The analyses are intraprocedural, 
and thus are only applicable to small programs where all methods are inlined. The work does 
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not describe the analyses precisely, nor does it consider the potential performance 
improvements of removing unnecessary synchronization. Our work precisely describes a 
family of interprocedural analyses for removing unnecessary synchronization and provides an 
initial evaluation of their effects on set of benchmarks. 

The Extended Static Checking System [DRL+98] allows a programmer to specify a locking 
protocol using code annotations. The program is then checked for simple errors such as 
deadlocks or race conditions. This system complements ours by focusing on the correctness of 
the source code, while our analyses increase the efficiency of the generated code. 



7. Conclusion 



This paper presented a set of interprocedural static analyses that effectively detect and eliminate 
unnecessary synchronization. These analyses identify excess synchronization operations due to 
reentrant locks, enclosed locks, and thread-local locks. A partial implementation of our 
analyses eliminates 30-70% of synchronization overhead on three Java benchmarks. Our 
optimizations support a style of programming in which synchronization code is written for 
software engineering objectives rather than hand-optimized for efficiency. 



Acknowledgements 



This work has been supported in part by a National Defense Science and Engineering Graduate 
Fellowship from the Department of Defense, NSF grant CCR-9503741, NSF Young 
Investigator Award CCR-9457767, and gifts from Sun Microsystems, IBM, Xerox PARC, 
Object Technology International, Edison Design Group, and Pure Software. We appreciate 
feedback and pointers to related work from David Grove, Martin Rinard, William Chan, 
Satoshi Matsuoka, members of the Vortex group, and the anonymous reviewers. We also thank 
the authors of our benchmarks: JavaSoft (javac), Philip Wadler (pizza), Andrew Appel (jlex and 
javacup), and Greg Badros (cassowary). 



References 



[ALL89] T. E. Anderson, E. D. Lazowska and H. M. Levy. The Performance Implications of 
Thread Management Alternatives for Shared-Memory Multiprocessors. IEEE Transactions 
on Computers 38(12), December 1989, pp. 1631-1644. 

[BKM+98] D. Bacon, R. Konuru, C. Murthy, M. Serrano. Thin Locks: Featherweight 
Synchronization for Java. In Proceedings of the 1998 Conference on Programming 
Language Design and Implementation, Montreal, Canada, June 1998. 

[BW88] H. Boehm and M. Weiser. Garbage Collection in an Uncooperative Environment. 
Software Practice & Experience, September 1988, pp. 807-820. 

[C98] J. Corbett. Using Shape Analysis to Reduce Finite-State Models of Concurrent Java 
Programs. In Proceedings of the International Symposium on Software Testing and 
Analysis, March 1998. A more recent version is University of Hawaii ICS-TR-98-20, 
available at http://www.ics.hawaii.edu/~corbett/pubs.html. 

[DDG+96] J. Dean, G. DeFouw, D. Grove, V. Litvinov, and C. Chambers. Vortex: An 
Optimizing Compiler for Object-Oriented Languages. In Proceedings of the Eleventh 
Conference on Object-Oriented Programming, Systems, Languages, and Applications, 
October 1996. 




38 J. Aldrich et al. 



[DRL+98] David L. Beliefs, K. Ruslan, M. Leino, Greg Nelson, and James B. Saxe. Extended 
Sialic Checking. Compaq SRC Research Report 159. 1998. 

[DR98] P. Diniz and M. Rinard. Lock Coarsening: Eliminaling Lock Overhead in 
Aulomalically Parallelized Objecl-based Programs. In Journal of Parallel and Dislribuled 
Computing, Volume 49, Number 2, March 1998, pp. 218-244. 

[D97] J. Dolby. Automatic Inline Allocation of Objecls. In Proceedings of Ihe 1997 ACM 
SIGPLAN Conference on Programming Language Design and Implemenlalion, June 1997. 

[GMS77] C. M. Geschke, J. H. Morris and E. H. Sallerlhwaite. Early Experiences wilh Mesa. 
Communications of Ihe Association for Computing Machinery, 20(8), Augusl 1977, pp. 
540-553. 

[GJS96] J. Gosling, B. Joy, and G. Steele. The Java Language Specification. Addison-Wesley, 
1996. 

[GDD+97] D. Grove, G. DeFouw, J. Dean, and C. Chambers. Call Graph Conslruclion in 
Objecl-Orienled Languages. In Proceedings of Ihe 12“' Conference on Objecl-Orienled 
Programming, Systems, Languages, and Applications, 1997. 

[H91] J. Hogg. Islands: Aliasing Protection in Objecl-Orienled Languages. In Proceedings of 
Ihe Sixlh Conference on Objecl-Orienled Programming, Systems, Languages, and 
Applications, November 1991. 

[K93] D. Keppel. Tools and Techniques for Building Fasl Portable Thread Packages. 
Universily of Washington Technical Report UW CSE 93-05-06, May 1993. 

[KP98] A. Rrall and M. Probsl. Monitors and Exceptions: How to implemenl Java efficienlly. 
ACM 1998 Workshop on Java for High-Performance Nelwork Computing, 1998. 

[LR80] B. Lampson and D. Redell. Experience wilh Processes and Monitors in Mesa. In 
Communications of Ihe Association for Computing Machinery 23(2), February 1980, pp. 
105-117. 

[M96] N. Minsky. Towards Alias-Free Pointers. In Proceedings of Ihe lOlh European 
Conference on Objecl Oriented Programming, Linz, Auslria July 1996. 

[NVP98] J. Noble, J. Vitek, and J. Poller. Flexible Alias Protection. In Proceedings of Ihe 12lh 
European Conference on Objecl Oriented Programming, Brussels, Belgium, July 1998. 

[PZC95] J. Plevyak, X. Zhang, and A. Chien. Oblaining Sequential Efficiency for Concurrenl 
Objecl-Orienled Languages. In Proceedings of Ihe 22°“ Symposium on Principles of 
Programming Languages, San Francisco, CA, January 1995. 

[S88] Olin Shivers. Conlrol-Flow Analysis in Scheme. SIGPLAN Notices, 23(7): 164-174, 
July 1988. In Proceedings of Ihe ACM SIGPLAN ”88 Conference on Programming 
Language Design and Implemenlalion. 

[SNR+97] S. Singhal, B. Nguyen, R. Redpalh, M. Fraenkel, and J. Nguyen. Building High- 
Performance Applications and Services in Java: An Experiential Sludy. IBM T.J. Walson 
Research Center white paper, available al 

hllp://www.ibm.com/java/educalion/javahipr.hlml. 1997. 

[SGA+98] E. G. Sirer, A. J. Gregory, N.R. Anderson, B.N. Bershad. Dislribuled Virtual 
Machines: A System Archileclure for Nelwork Computing. In Proceedings of Ihe Eighlh 
ACM SIGOPS European Workshop, September 1998. 




Dynamic Partitioning 
in Analyses of Numerical Properties * 



Bertrand Jeannet, Nicolas Halbwachs, and Pascal Raymond 
Verimag**, Grenoble - France 

{Bertrand. Jeannet , Nicolas .Halbwachs , Pascal .Raymond}@imag. fr 



Abstract. We apply linear relation analysis [CH78,HPR97] to the veri- 
fication of declarative synchronous programs [Hal98]. In this approach, 
state partitioning plays an important role: on one hand the precision of 
the results highly depends on the hneness of the partitioning; on the 
other hand, a too much detailed partitioning may result in an exponen- 
tial explosion of the analysis. In this paper, we propose to dynamically 
select a suitable partitioning according to the property to be proved. 



1 Introduction 

Partitioning in abstract interpretation: We address the standard case of abstract 
interpretation [CC77], where a set S of concrete states is considered, and where 
a complete lattice (L, C, n, U) of abstract values is associated with the powerset 
2^ by means of a Galois connection {a, 7 ). In this context, partitioning a fixpoint 
equation X = F{X) — where F is a monotone function from L to L — consists 
in choosing a finite partition (or, more generally, a finite covering) (Fi, . . . , Sk} 
of S (i.e., such that S = lJi=i by replacing the initial equation by the 

system of equations 

A, = F,(Ai,...,Afc), i = l...k 

where Xi = X r\ Li and Li = a{Si). In general, the advantages of partitioning 
are that the component functions (Fj)i=i..fc are simpler than F, and that the 
system can be more efficiently solved, using chaotic iterations. It can be even 
more interesting in two common cases: 

— When a widening is necessary to ensure the termination of iterations, the 
widening only needs to be applied on some equations (cutting dependence 
loops [Bou92b,CC92b]). Thus, less information is lost because of the wide- 
ning. 

— In lattices where the least upper bound (lub) operation U loses information, 
i.e., when 

l{X)Uj{Y)^j{XUY) 

* This work was partially supported by the ESPRIT-LTR project “SYRF”. 

** Verimag is a joint laboratory of Universite Joseph Fourier (Grenoble I), CNRS and 
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initial not bO and not bl and (x=0) and (y=0); 
transition 

bO’ = not bl; 
bl' = bO; 

x’ = if b0=bl then x+1 else x; 
y’ = if b0=bl then y else y+1; 



Fig. 1. An example “program” 



partitioning allows some operands of a lub operation to be considered sepa- 
rately (i.e., considering Xi and Xj instead of Xi U Xj), thus representing 
exact union of concrete values ([Bou92a] introduces the term “dynamic par- 
titioning” in this context). 

Examples of such cases are intervals [CC76] or linear relations [CH78] analyses. 

In the analysis of imperative sequential programs, a natural partitioning is 
given by the control structure of the program. A concrete state is a pair {c,v), 
where c is a control location, and w is a valuation of variables (memory state), 
and concrete states are generally partitioned according to their c components. 

We are interested in the verification of synchronous data-flow programs, using 
abstract interpretation. Such programs don’t present any control structure, but 
the trivial one: a single loop around a unique control location, which results in 
no partitioning at all. In order to get a more interesting partitioning, we used 
the control automaton, produced by some compilers by performing a partial 
evaluation [JGS93] of all Boolean variables of the program [Hal98]. 

Let us consider a trivial example: the “program” shown on Fig. 1 is, basi- 
cally, a system of equations defining the dynamic behavior of a set of Boolean 
(“bl, b2”) and integer (“x, y”) variables. Their possible initial states are spe- 
cified by a formula (“initial”), then each equation defines the “next” (primed) 
value of a variable as an expression of the current values. Fig. 2 (a) and (b) show 
two imperative versions of this program: Fig. 2. a is the trivial control structure, 
while Fig. 2.b shows the automaton obtained by a partial evaluation of Boolean 
variables. 

The problem is that the automaton produced by partial evaluation may have 
an exponential number of control locations (Boolean states). Of course, partitio- 
ning a flxpoint equation into a system with millions of equations is not realistic. 
Moreover, the obtained control structure is not always suitable to obtain precise 
enough results: in some cases, control locations should be split again, to delay 
the application of a lub operator. For instance, in the example above, if our goal 
is to prove, by linear relation analysis, that x is always greater or equal to y, the 
control structure of Fig. 2. a is too coarse (the behavior of Boolean variables is 
ignored); conversely, the partitioning of Fig. 2.b is too much detailed: we could 
complexify the program, with n Boolean variables, and get a ring of 2" locations. 
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bO 

bl 

X 

y 




x:=y:=0 



if bO=bl then 
x:=x+l 
else y:=y+l 

(bO,bl) := 

(not bl,bO) 



bO bl _ bObl 

x:=x+l 

x:=x+l 

0^0 

y:=y+l 

bObl bO bl 




(a) 



(b) 



(c) 



Fig. 2. Control structures for the example program 



while the property can indeed be shown on an automaton with only 2 locations 
(Fig. 2.c). 

We use abstract interpretation — and, more precisely, Linear Relation Ana- 
lysis — to verify safety properties of program, i.e., to show that some “bad” 
states cannot be reached during the program execution. In this context, and as 
a solution to the above problem, we propose to dynamically choose the partitio- 
ning, according to the property we want to prove. Dynamic partitioning consists 
in starting from a very coarse control structure (basically, distinguishing only 
between initial states, bad states, and others), and to refine it according to heu- 
ristics, until the unreachability of bad states from initial ones has been shown. 
Our goal is to approach the simplest control structure, in order to prove the 
property. 



2 Definitions and Notations 

Let us note B = {T,F} the set of Boolean values, and Af the set of numerical 
values (which may be integer and/or rational numbers). For a program with p 
Boolean variables and q numerical variables, the set S of concrete states is x 
Af®. The program defines a set Smit of initial states, and a transition relation^ 
— >-C S X S. Moreover, for the sake of verification, the program is supposed to be 
equipped with an “invariant”, i.e., a formula specifying the property one want 
to prove. We note Serror the set of “bad” states that violate the invariant. The 
goal is to show that there is no path sq, si, . . . , s„, with sq G Smit, Sn G Serror, 
and for all i = l..n, Si_i — >■ Si. For instance. Fig. 3 shows our example program 
augmented with an “invariant”, which becomes false whenever x<y. 

^ In our implementation, we consider deterministic programs with inputs, instead of 
the non-deterministic ones which are considered here, for simplicity. 
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initial not bO and not bl and (x=0) and (y=0) and ok; 
transition 

bO' = not bO; 
bl' = bO; 

x’ = if bO=bl then x+1 else x; 
y’ = if bO=bl then y else y+1; 
ok' = ok and (x>y); 
invariant ok; 



Fig. 3. A program with a property 



The lattice of abstract values we consider is L = 2®^ x La[, where Lj^f is a set 
of abstract values for numerical domains: there is a Galois connection (ctAf)7Af) 
between 2^‘‘ and Lj^/. Throughout the paper, L_a/ may be either the lattice 
of tuples of intervals (interval analysis), or the lattice of convex 

polyhedra of (linear relation analysis). We assume the existence of standard 
operations n^y, U_a/, in Ljij-. 

An abstract value is thus a pair (/, D), where / is a Boolean formula repre- 
senting a subset of and D G It represents the set of concrete states 

7(/, D) = {{b,v) \ b[= J, V & 7Af (T>) } 

Operations on abstract values are derived straightforwardly: 

(/i,i7i)n(/2,02) = (/i A/2,i7i O 2 ) 

(/i,L»i)U(/ 2,02) = (/l V/ 2 ,T>i D 2 ) 

{h,Di)V{f2,D2) ={fiX f2,DiV^D2) 

Notice that the lub operator U loses information, not only because loses 
information in the numerical lattices we consider, but also because the Boolean 
and numerical parts are considered separately: {bAx > 0)U(6 Ax < 0) = (T, T) 
instead of {b = {x > 0)). 

With these tools, we are able to perform both forward and backward analyses 
on our programs. 

3 Control Structures 

A control structure is an automaton {Q,Qo,Qf,"^, def), where 

— (5 is a finite set of locations, Qq and Q f are subsets of Q (respectively called 
sets of initial and final locations) . 

— def is a function from Q to L, such that UgeQ l{def{q)) = S. In other words, 
{"f{def{q)) I g G Q} is a covering of S. Moreover, Qq and Qf are such that 

Sinit U l{def{q)) and Serror U l{def{q)) 
q&Qo q&Qp 
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— is a binary relation on Q (transition relation), such that 

^3s G "f{def{q)),3s' G l{def{q')) such that s ^ s'^ q'^ q' 

In other words, a control structure is an abstraction of the program. Each lo- 
cation q represents a set of concrete states '){def{q)) exactly represented by an 
abstract value def{q), and the relation transition is an upper approximation of 
the program transition relation. The important point is that, if there is no path 
in the automaton from an initial to a final location, then no bad state can be 
reached from an initial state in the program. 



3.1 Analysis on a Control Structure 

We use forward and backward analyses on control structures. Forward analysis 
computes, for each location q, an upper approximation reach{q) of the set of 
states in ^{def{q)) that are reachable from an initial state of the program. Back- 
ward analysis computes, for each location q, an upper approximation coreach(q) 
of the set of states in 'j{def{q)) from which one can reach a bad state (these 
states are said to be “coreachable” from bad states). 

Since we are interested in states that are both reachable from initial states, 
and co-reachable from bad states, we associate with each location q, an upper 
approximation danger{q) G T of the set of these states in ^{def{q)). Our goal 
is to show that, for each location q (or, equivalently, for each location q in Qo), 
danger{q) = -L^. danger{q) is computed by standard combination of forward 
and backward analyses: for any A G T let us note 

post{X) the upper-approximation in L of the set 
{s' G S' I 3s G A such that s — >■ s'} 
pre{X) the upper-approximation in L of the set 
(s G S I 3s' G A such that s — >■ s'} 

In a first step, forward analysis computes reach{q),q G Q as an upper approxi- 
mation of the least solution of the system 

reach{q) = | | post{reach{q)) □ def{q) 

q'-^q 

After convergence of this step, one can update the relation as: 

q q' post{reach{q)) r\ def(q') ^ J-L 

Then, backward analysis can be applied to compute danger {q), for each q € Q, 
as an upper approximation of the least solution of the system 

danger{q) = | | pre{danger{q')) □ reach{q) 

q~^q' 
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Fig. 4. Initial control structure 



Again, the relation is updated by: 

q q' ^ pre{danger{q')) □ reach{q) ^ 

The definitions of locations are also updated, by taking def{q) = danger{q), 
since only dangerous states are still of interest. Then a new forward analysis can 
be applied to the updated structure. 

Forward and backward analyses are performed this way, in alternation, until 
convergence. 

3.2 Initial Control Structure 

We start from a very coarse control structure, which distinguishes initial states, 
final (bad) states, and others; ideally, Qo (resp. Qf) should cover exactly initial 
(resp. bad) states 

^(^dcf i^q)') = Sifiit 5 T(^^/(q)) ~ Serror 

Q^Qo Q^Qf 

Under reasonable assumptions about initial and bad states, this covering is pos- 
sible. Otherwise, on can always add an extra initial and/or final state to the 
program, which be characterized by a purely Boolean formula. For instance, in 
the program of Fig 3, the invariance property “x>y” has been “delayed” into 
a Boolean state variable ok, and “bad” states are those in which ok=false. In 
this initial control structure, before any analysis, the transition relation ^ is 
assumed to be complete, except that locations in Qq and Qf are, respectively, 
sources and sinks. Fig. 4 shows the initial control structure for our example pro- 
gram. On this Figure, the definition def{q) of each location q is shown in a grey 
box. 

Notice that such a detailed covering of initial and bad states results from a 
deliberate choice: on one hand, it starts the analysis with a precise separation of 
relevant states; on the other hand, the influence of this separation on the analysis 
cost will not be dramatic: since locations in Qo and Qf never belong to loops, 
they will not be involved in iterations. 
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Fig. 5. Refining a location 



4 Control Structure Refinement 

4.1 Principles 

Let us give the intuition of the kind of refinement we want to perform. Once the 
initial control structure is built, the analysis sketched in section 3.1 will associate 
with each location q an abstract value danger {q) — upper approximation of the 
states in def{q) that are both reachable from Sinu and coreachable from Serror 
— , and possibly reduce the transition relation The goal is to obtain 



and, of course, this goal is seldom achieved on the initial control structure. 
Refining a location q consists in noticing that concrete states in def{q) don’t 
show all the same behavior, in terms of accessibility. 

For instance, consider the fragment of structure shown on Fig. 5. (a), and 
assume (as shown on the Figure) that: 



Vg, danger{q) = -L^, and 0 




In such a case, splitting q into q' and q" , as shown in Fig. 5.(b), with 



post{def{qi)) C -f{def{q)) and post{def {q 2 )) C "f{def{q")) 



clearly improves our knowledge about location accessibility. Now such refine- 
ments must be performed with care: arbitrary refinements may uselessly com- 
plexify the control structure, and even result in infinite computations. 
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4.2 Refinement Heuristics 

In this section, we explain the kind of refinements implemented in our tool, 
which is specialized in Linear Relation Analysis. To avoid the risk of infinite refi- 
nements, we adopt the following restriction: locations will only be split according 
to conditions (Boolean formulas, or linear constraints) appearing in the program. 

Let q and qi be two locations such that q'^ q\. 

Assume there exists a condition c appearing in the 
program, such that 

{def {q) n pre{qi)) n (de/(( 7 ) n a(c)) = _Ll 

Then q can be safely split into two locations 
q' and q” , with def(q') = def{q) □ a(c) and 
def{q") = def{q) n a{^c), and q' qx- 

Notice that this case is rather common, since, in general, transitions are guarded 
by conditions, which naturally separate preconditions. 

Our global refinement scheme consists in splitting one location at a time, 
performing (forward and backward) analysis until convergence, and repeating 
this process until either reaching the goal (all danger{q) empty), or obtaining 
the most detailed structure according to our set of conditions. The choice of not 
splitting many locations at each step is to avoid unnecessary increasing of the 
structure: a single refinement, followed by an analysis, can eliminate the apparent 
need of other refinements. Of course, this choice may increase the number of steps 
before termination. 

This simple heuristic is surprisingly efficient: on one hand, in many examples, 
it splits locations enough to prove the property, and on the other hand, it does 
not generate many useless locations. 

5 Examples 

5.1 The Simple Example 

Let us illustrate the proposed technique by detailing the analysis of our simple 
example (Fig. 3). The initial control structure was shown by Fig. 4. Fig. 6. (a) 
shows the results of a forward analysis on this structure: the result for each 
location appears in an ellipse. Backward analysis does not give any additional 
information. Then, location q\ is split according to the condition x > y (guard 
of a transition, appearing in the program). Fig. 6.(b) shows the result of forward 
analysis on the refined structure. Finally, backward analysis disconnects the 
initial location (Fig 6.(c)). The subsequent forward analysis leaves the structure 
unchanged, thus completing the analysis. 

Of course, this example is trivial, but it shows that the general approach is 
successful: the verification steps remain exactly the same if we generalize the 
program as a ring of n cycles of alternate incrementations, or/and by adding 
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(a) Forward analysis on the initial structure 



ok A 

—ibO A — >61 
Ax = y = 0 



ok A X > 1 
Ax > 2/ > C 



ok A X > 1 
Ax = y — 1 



—lokA 

y — 2 < X < y 
Ax + y > 4 




(b) Refinement + forward analysis 



ok A 

— >60 A ->61 
Ax = y = 0 



ok A X > 1 
Ax > 2 / > C 



ofc A X > 1 

A60 = 61 
Ax = y — 1 



—tokA 

60 ^ 61A 
X = y A X > 2 




(c) Backward analysis 



Fig. 6. Analysis of the simple example 
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plus 



Motor 



Controller 



speed 



Fig. 7. CD controller 



more counters incremented alternately. This meets our goal that the verification 
be independent of details irrelevant to the satisfaction of the property. 

5.2 A Compact-Disk Controller 

As a more realistic example, we consider a (simplified) compact-disk controller, 
connected to a motor as shown by Fig. 7. 

The motor reacts to two logical signals, plus and minus, which make it run 
faster or slower, respectively. When neither of these two signals is on, the speed 
varies within two bounds (here, —4 and -1-4). A sensor gives the current speed. 
The controller tries to maintain the motor speed within a specified range (here 
[8, 12]), by sending appropriate signals plus and minus. Whenever the speed has 
been outside the range for a given delay (here, 15), counted as a number of 
steps, an error signal is sent. With our notations for “programs”. Fig. 8 gives the 
assumptions on the motor, and the program of the controller. 



Motor: 

speed'-speed <= 4; 
speed'-speed >= -4; 
plus => (speed’-speed > 0); 
minus => (speed’-speed < 0); 



Controller: 

plus’ = (speed <= 9); 
minus’ = (speed >= 11); 
count’ = if speed’< 8 or speed’> 12 
then count-tl else 0; 
error = (count >= 15); 



Fig. 8. The CD controller example 



We want to verify that the error signal is never sent. This means that, if 
the speed is initially 0, it must reach the correct range in time, and be properly 
controlled afterwards. The system has 2 Boolean and 2 numerical state variables. 
A partial evaluation of Boolean variables provides a structure with 4 locations, 
on which the property cannot be proved. The reason is that, in this example, 
the “relevant control” essentially comes from numerical thresholds. A further 
complete refinement according to these thresholds gives a structure with 4 x 5 x 
2 = 40 locations (4 Boolean states, 5 relevant intervals for speed, and 2 intervals 
for count). Our tool starts with an initial structure with 3 locations, and performs 
2 splits before proving the property, thus getting a structure with 5 locations. 
In fact, a manual examination shows that this is the minimal structure allowing 
the property to be verified. 
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6 Related Works 

The problem of the approximation due to the lub operation on an abstract lat- 
tice L is well-known and a general solution has already been proposed in the 
literature. A natural solution is to use the disjunctive completion of L [CC92a] 
which basically consists in manipulating non-redundant unions of elements of 
L. [Bou92a] proposes refined versions of this solution — with application to the 
computation of minimal function graphs — , where the number of disjunctions is 
limited by application of suitable widening operators. It is however not obvious 
that such solutions would be efficient in our case: decomposing synchronous pro- 
grams into smaller functions and applying interprocedural analysis leads to too 
rough results. We have then to consider the whole program as a big function and 
thus post- and pre-condition computations of abstract states involve themselves 
many unions, which can produce a combinatorial explosion if no abstract union 
is applied between two steps of the fixpoint computation. Moreover this solution 
is proposed in a context where the only goal is to compute good abstractions of 
concrete elements; computations are not guided by a property to be proved. 

[BGP97,BGL99] address the verification of GTL formulas on similar systems. 
Sets of states are represented by disjunctions of conjunctions of Boolean and 
numerical domains. Although numerical domains are represented by Presburger 
arithmetic formulas, which are closed under set union, the separation of the 
Boolean and the numerical parts leads to the manipulation of explicit unions. 
This technique requires the partitioning of the state space of the program in such 
a way that on each partition post- and pre-condition can be separately computed 
on the Boolean and the numerical parts. This partition is coarser than our finest 
partition because numerical constraints in numerical terms are handled directly 
in Presburger arithmetic, but remains still potentially explosive. Here also, the 
partition does not depend of the considered property. 

7 Conclusion and Further Work 

This work was motivated by our attempt to apply abstract interpretation 
techniques to the verification of synchronous data-flow programs; these pro- 
grams don’t provide any control structure to be used for partitioning. Our first 
idea [Hal98,HPR97] was to consider the control structure implicitly represented 
by Boolean (or, more generally, finite state) variables. In practice, this solution 
is not satisfactory: on one hand, for real-size programs, it often produces unm- 
anageable control structures (combinational explosion); on the other hand, the 
obtained control structure sometimes happens to be too coarse to allow the de- 
sired property to be proved. In this paper we propose to take into account the 
property in the selection of the control structure. An initial, very coarse, control 
structure is progressively refined, according to the needs of the verification. 

Another formulation of our problem is that we want to combine properties 
on numbers, with properties on Boolean variables. For this, we could have de- 
signed a new lattice of abstract values, integrating, e.g., BDDs and polyhedra. 
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like in [Mau96]. Instead, our proposal consists in entrust the control structure 
with the Boolean part. An advantage is that it allows Boolean properties to be 
combined, at low cost, with any other abstract interpretation. 

For us, this work is a first confirmation of the practical usefulness of dynamic 
partitioning. Of course it must be completed: more experiments should be driven, 
which will probably suggest some refinements of our heuristics. Also, it should be 
applied to other interpretations, which may have specific features. For instance, 
in interval analysis, one can easily evaluate the distance between 7(7 U J) and 
7 ( 1 ) 07 ( 7 ), thus measuring the approximation involved by each lub operation. 
This measure could obviously be used in the refinement heuristics. 
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Abstract. We define an operational semantics for the Signal language 
and design an analysis which allows to verify properties pertaining to the 
relation between values of the numeric and boolean variables of a reactive 
system. A distinguished feature of the analysis is that it is expressed 
and proved correct with respect to the source program rather than on 
an intermediate representation of the program. The analysis calculates a 
safe approximation to the set of reachable states by a symbolic fixed point 
computation in the domain of convex polyhedra using a novel widening 
operator based on the convex hull representation of polyhedra. 



1 Introduction 

Synchronous languages [11] such as Signal [2], Lustre [6] and Esterel [4] 
have been designed to facilitate the development of reactive systems. They enable 
a high-level specification and a modular design of complex reactive systems by 
structurally decomposing them into elementary processes. In this paper we show 
that semantics-based program analysis techniques originally developed for the 
imperative language paradigm can be applied to Signal programs, facilitating 
the design of static analyses for reactive systems. 

The verification of a reactive system written in a synchronous language is 
traditionally done by elaborating a finite model of the system (often as a finite- 
state machine) and then checking a property (e.g. liveness, dead-lock freedom, 
etc) against this model (i.e. model checking). For instance, model checking has 
been used at an industrial scale to Signal programs to check properties such 
as liveness, invariance and reachability [5]. Whereas model checking efficiently 
decides properties of finite state systems, the use of techniques from static anal- 
ysis enables to prove properties about infinite state systems such as properties 
on the linear relations between numerical quantities in the system. 

In this paper we design an analysis for the Signal programming language 
that allows to verify properties pertaining to the relation between values of 
the numeric and boolean variables in a Signal program. The interest of the 
approach is that we analyse programs at the source language level, rather than 
doing the verification on some intermediate representation (often in the form of 
an automaton) of the program. In particular, it allows a proof of correctness of 
the analysis with respect to the operational semantics of the source language. 
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The paper is structured as follows. In Sect. 2 and Sect. 3 we define the syn- 
tax and the operational semantics of Signal. The analysis itself has three parts. 
First, a Signal program is abstracted into a collection of constraint sets over 
the variables of the program. In Sect. 4 we present a syntax-directed program 
analysis that extracts these sets of constraints. A solution to one of these con- 
straint sets describes a possible behaviour of the program. An approximation to 
the set of reachable states of the program is obtained by a symbolic fixed point 
computation (Sect. 5) whose result is invariant for all the behaviours described 
by the constraint sets. This iterative calculation is done in the infinite domain of 
convex polyhedra. In order to ensure its convergence a novel widening technique 
that is described in Sect. 6 is used in the iterations. Section 7 discusses related 
work and Sect. 8 concludes. 

2 The SIGNAL Core Language 

We use a reduced version of the synchronous language Signal which we detail 
in this section. In Signal, a process p is either an equation or the synchronous 
composition p \\ p' of processes. Parallel composition p \\ p' synchronises the 
events produced by p and p' . The core language has the syntax defined below. 
We assume given a set of integer and boolean constants, ranged over by c, and a 
set. Mono Op, of basic operators such as addition and equality, ranged over by /. 

Syntax of Core SIGNAL 

pgm — >■ {eqn || . . . || eqn) init mem 
eqn — >■ x : = e \ synchro Ci 62 

e —?■ X \ zx \ c \ /(ci . . . e„) | Ci when 62 | ei default 62 

mem — >■ zx = c \ mem ; mem 

where mem gives an initial value to all delay variables. Within a Signal pro- 
gram, three kinds of operators can be distinguished. 

— Basic “monochrone” operators such as ...which require that their 

arguments are synchronised i.e. they are either all present or all absent. When 
all the arguments are present, they have their usual arithmetical semantics. 

— “Polychrone” operators for which arguments are not necessarily synchronous. 
Signal provides two such operators. The when is used for sampling a signal: 
the signal defined by the expression x when b is obtained by sampling x at 
instances when b is true. The default (union) operator merges two signals 
giving precedence to the signal at the left of the operator. 

— In classical Signal, the delay operator $ is used to access the previous 
value of a signal: the signal a; $ 1 is synchronous with x itself and carries the 
previous value of x. By replacing 1 by other numbers, this mechanism permits 
to access values that were emitted several instances back i.e. it provides a 
mechanism for storing values in a memory cell for later access. We modify the 
syntax of Signal in a way that makes explicit this memorising by forcing 
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the user to name the memory cells and specify their initial value. More 
precisely, the only way to access the previous value of a signal x is via a 
specifically designated delay variable that we shall write zx. Access to values 
two instances back in the history of x must then be obtained by assigning 
the value of zx to another signal y and accessing its delay variable zy. To 
illustrate the last point: the equation x := ((x $ 1) +1) when tick is 
transformed into the program x := (zx + 1) when tick . 

The distinction between the two kinds of variables means that we have the set 
of variables Var = X U ZX where 

— AT is the set of (observable) signals in the original program, ranged over by 
x,y,z... 

— ZX = {zx \ X G X} is the set the memory (or delay) variables introduced 
by the transformation (i.e an isomorphic copy of X). 

By convention, variable names prefixed by z will indicate delay variables. 

The Bathtub Example A typical yet simple example of reactive system that our 
analysis aims at handling is the bathtub problem. Suppose we have a bath which 
we wish to control so as to make sure that its water level never gets empty or 
overflows. We need both a faucet and a pump and a mechanism to activate them 
before some critical condition occurs. 

The level in the bathtub is increased by the faucet, decreased by the pump. 
The flow of the faucet increases as long as the level is low; likewise, the pump 
is emptying more actively at higher levels. An output alarm is raised whenever 
the level gets out of bounds. 



( level 


:= zlevel 


+ 


faucet - 


pump 












1 faucet 


:= zfaucet 


+ 


( ( 1 when 
default 


zlevel <= 4) 
0) 


def ault 


(-1 


when 


zfaucet 


> 0) 


1 pump 


:= zpump 


+ 


( ( 1 when 
default 


zlevel >= 7) 
0) 


def ault 


(-1 


when 


zpump 


> 0) 



I alarm := (0 >= level) or (level >= 9) 

) 

init zlevel = 1; zfaucet = 0; zpump = 0; zalarm = false 

Although it is simple to model such a system in Signal, it is not evident 
whether the alarm ever can be raised. The analysis presented in this paper allows 
such a verification. Even if this example is finite-state, the analysis to come is 
not limited to proving properties on such systems since it handles linear numeric 
properties for infinite ones. 



3 Operational Semantics of Core SIGNAL 

A Signal program is modelled by a labeled transition system (Mem, Label, mg) 
with initial state TOq where 
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— Value, the range of variables, is the union of the boolean and the integer 
domains augmented by the element _L to model the absence of value. 

Value = Int U Bool U {_L} 

— Label = X — >■ Value is the set of all the potential instantaneous events that 
can be emitted by the system. 

— Mem = ZX — >• ( Value — {-L}) is the set of memory states. A state of memory 
m G Mem stores the value of each delay variable zx. Since it is the previous 
value carried by its corresponding signal x, memory variables never take the 
absent value _L. 

— A memory and a label together specify a value for each variable in Var. Such 
a pair is called a state and belongs to the set 

State = Mem x Label. 

In the following, we assume that variables are typed with either type Lnt or 
Bool and that all labels, memories and states respect this typing. Values named 
u, V, Ui range over non-_L values whereas k will range over the whole domain. 

3.1 Semantics of Expressions 

Given a memory m G Mem, the semantics of an expression e, written L[e\m, is 
a set of pairs (A,w) where A G Label is a map from observable signals to values 
and V is the current value of the expression. E[e]m ■ expresses that is a 

possible value of e given that the observable signals of the program take values 
as specified by A. This function has type 

£[ ] : Expr — >• Mem — >■ tP{Label x Value) 

and is defined by a set of inference rules that for a given e and m inductively 
define the set £[e]m- 

Constants Constants can synchronise with any signal thus for any memory and 
label, the constant expression can either evaluate to its value or be absent. 

£[c]m:(A,c) £[c]m:(A, -L) 

Variables The evaluation of a program (non-delay) variable expression must 
yield the value that the variable is assigned in the corresponding label. 

L[x]m : (A, A(x)) 

Signal imposes a synchronisation constraint between a signal and its delay: 
the delay variable can only be accessed when the signal itself is present. When 
present, the delay variable expression retrieves in memory the previous value of 
its associated signal; otherwise, both get the T value. 

A(x) = u \{x) = T 

E.[zx]m ■■ (X,m{zx)) E[zx]m ■■ {X, -L) 
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Monochrone Operators According to the monochrone rule, if all arguments 
of an operator / evaluate to a non-_L value then the result is the usual mean- 
ing of this operator, otherwise all the arguments are absent and the expression 
evaluates to _L. 

(£[e»]rrt : (A,-L))"^i,/ € MonoOp (£[ej]m : € MonoOp 

£[/(ei . . . e„)]m : (A, _L) £[/(ei • • • e„)]m : (A, /(ui . . . Un)) 

When Operator If the condition is satisfied, the evaluation of the when ex- 
pression yields its first argument, otherwise _L. 



£[ei]m : (A, fc),£[e2]m : (A, _L) £[ei]m : (A, fc),£[e2]m : (A, false) 

£[ei when 62]™ : (A, _L) £[ei when 62]™ : (A,_L) 

£[ei]m : (A, fc), £[e2]m : (A, true) 

£[ei when 62]™ : (A, k) 

Default Operator The evaluation of the default expression yields _L if both 
arguments are absent, otherwise its leftmost non-_L argument. 

£[ei]m : (A, u), £[e2]m : (A, k) £-[ei]m ■ (A, _L), £[62]^ : (A, A:) 

£[ei default 62]^,: (A, u) £[ei default 62]^: (A, A) 

3.2 Semantics of a System of Equations 

A program is the parallel composition of two kinds of equations: assignments 
and synchronisations. Each equation imposes constraints on the labels that can 
be emitted by the system this equation belongs to. More precisely, given a mem- 
ory, the semantics of an equation is a set of possible labels inferred from the 
synchronisation and assignment rules. 

£.q[ ] : Eq — >■ Mem — >■ ’?{Label) 

Synchronisation If both sides of the synchronisation equation evaluate either to 
_L or a value and if their labels agree then these expressions are synchronised. 

£[^i]m ■ (A, Aji), £[e2]m ■ (A, ^^2), k\ — _L k 2 — -L 
Eq[ synchro ci e2]m : A 

Assignment If the value of the the right-hand side agrees with the value of the 
left-hand side stored in the label then an assignment can occur. 

£[e]m : (A, fc), A(y) = k 
Eq[y := e]™ : A 
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Parallel Composition The parallel composition rule checks that the same label 
can be inferred for each equation. It means that this label describes a behaviour 
consistent with all the equations. 

£g[egi] m ■ £g[eg2] m ■ ^ 

£.q[eqi || 6 ^ 2 ] m ■ ^ 



3.3 Transition Semantics of a Set of Equations 

For each variable, the memory state stores the value that it was given the step 
before. Hence every time an observable signal receives a new value, the memory 
state has to be updated with that value. The function 

tr : Mem x Label — >■ Mem 



defines how the system evolves from one memory state to another when emitting 
a label A. 



tr{m, X){zx) = 



\{x) if \{x) ^ T 
m{zx) otherwise 

A set of equations defines a transition relation between memory states. 

A 



Definition 1. A set of equations Eq induces a transition relation 

£.q[Eq]m ■■ A 
m tr{m, A) 



defined by 



3.4 Transition System Semantics of Programs 

The Signal syntax imposes that all delay variables are given an explicit initial- 
isation; the initial memory mo that assigns an initial value to all delay variables 
can therefore be extracted from the program text directly. We can then define 
the semantics of a program as a rooted, labeled transition system as follows: 

Definition 2. The semantics of program P = Eq init mo is defined by 

|P] = {Mem, mo) 

The Bathtub Example (Continued) Given an initial memory state 

mo = {zlevel i-A- l,zfaucet 1 — 0,zpump 0,zalarm ha false} 
we can derive the following (label,value)-pair for 1 when zlevel <= 4. 

£1^0 [1 when zlevel <= 4] = (A, 1) 

by considering A = {level h->- 2, faucet ha l,pump ha 0, alarm ha false} since 
mo (zlevel) is less than 4. For any equation y := e oi the bathtub example, 
we can derive for the expression e the (label, value)-pair (A, X{y)) thus proving 
£g[Bath]mo ^ A. Since no variable is absent in A, all the memory variables are 
updated. The new memory state calculated by the transition function tr given 
mo and A is: 

tr{mo,X) = {zx ha A(a;) | zx G (zlevel, zfaucet, zpump, zalarm}} 
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4 Constraint-Based Analysis 

In this section we present an analysis for determining invariants of the behaviour 
of a given Signal program. These invariants express relations between the val- 
ues of the program’s signals that hold at all instances during the execution of 
the program. We simplify the problem by considering invariants on the memory 
variables only. This is possible because values of observable signals are immedi- 
ately stored in their corresponding memory variables hence any relation found 
between memory variables is a valid relation between the corresponding observ- 
ables. Formally, we want to find M C Mem such that if mo is the initial state 
of a program and mg — >■* m then m G M . 



4.1 T-Invariants 

Given a program, each possible transition is completely specified by the memory 
m and the label of observable values A (the resulting state is then tr{m, A) 
cf. Sect. 3.3). Thus, a subset F of the set State = Mem x Label can be considered 
as a restriction imposed on the behaviour of a program: a transition is only 
allowed if it is a member of T. We say that a set of memory states is T-invariant 
if it is invariant under all transitions authorised by F. Formally, M C Mem is 
T-invariant if 

V(m, A) G F.if m G M then tr{m, A) G M. 

This notion facilitates the handling of non-determinism of Signal programs. 
Different behaviours are possible in a given memory state depending on the 
absence or presence of a signal. It is then convenient to split the analysis into 
finding invariants for each possible combination of absence and presence in a 
program. More precisely, we structure the analysis in two phases: 

1. Determine a set C ‘J’(State) of behaviour restrictions such that all 

the Fi together account for any possible behaviour of the program. Each Fi 
will be constructed so that a given signal is either always present or always 
absent in T). 

2. Calculate an M C Mem such that M is Ti-invariant for all the Fi. 

Each Fi is the solution to a set of constraints resulting from an analysis 
of the source program. The analysis never calculates the Fi explicitly but uses 
these sets of constraints in the calculation of the invariant M in phase 2. In 
the following we present the constraint-based analysis of the program and prove 
that the constraints found for a given program correctly approximate the possible 
behaviours of a program. 



4.2 Constraint Extraction 

In the proof to follow we consider programs in normal form. No loss of generality 
is incurred since any program can be translated into this form by recursively 
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introducing extra variables and equations for each composite expression (see 
Appendix A for details). The analysis will be carried out for these programs. 

pgm — >■ eqn \ eqn \ \ pgm 

eqn x : = c \ x \ = x' \ x : = zx' \ x \= f{x \, . . . , Xn) 

\x : = X\ when X 2 \x : = X\ default X2 



Semantics of Constraints The language of constraints is defined by the fol- 
lowing syntax: 

cst y = e\ y^l. 
e c I a; I f{xi , . . . | T 

Among these constraints, y = f{xi, . . . ,Xn) reflects the standard meaning of 
monochrone operators. The constraints y = T and y yf T express presence and 
absence of signal y, respectively. A constraint set C built over a set of variables 
V C Var symbolically represents a set S' C State. The precise semantics of C is 
therefore given by the solution function Sol deflned such that S = Sol{C). 



Sol{C) = Pi So;({c}) 

cSC 



Sol{{y / T}) = {v I v{y) T} 

Sol{{y = T}) = {v I v{y) = T} 

Sol{{y = c}) = {v I v{y) = c} 

Sol{{y = x}) = {v I v{x) = v{y)} 

Sol{{y = f{xi, . . .,*„)}) = {v I v{y) = f{v{xi),.. . ,v{x„)),v{xi) ^ ^ T} 

U {v I v{y) = T, v{xi) = T, . . . , v{x„) = T} 



Fig. 1. Semantics of constraints 



Definition 3. We extend the function Sol to sets C of constraint sets as follows: 

Sol{C) = IJ Sol{C) 

cec 



Constraint Extraction Function 

Definition 4. The constraint extraction function CE computes for a program 
a set of constraint sets that over- approximates the possible behaviours of the 
program. 
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Const CE{y := c) = {{y = c}, {y = ±}} 

Var CE{y := x) = {{?/ = x}} 

Delay CE{y := zx) = {{* 7 ^ ±, j/ = zx}, 

{x = ±,y = ±}} 

MonoOp CE{y := f{xi,...,Xn)) = {{y = f{xi,...,Xn),y^ ±,xi ^ ±,...,Xn^ -L}, 

{y = ±,xi =±,...,Xn = -L}} 

When CE{y := x when b) = {{y = x,b = true}, {y = E,b = false}, 

{y = -L,6 = ±}} 

Default CE{y := a default b) = {{y = a, a ^ ±}, {y = b,a = ±}} 

Parallel CE{eqi || . . . || eq„) = {C | C = IJLi Ci and Ci £ CE{eqi)} 

The Bathtub Example (Analysis) We consider the composition of the equations 

level := zlevel + (faucet - pump) 

I alarm := ((0 >= level) or (level >= 13)) 

Applying the constraint extraction function, one obtains a set of constraint sets 
for each equation in isolation. 

CE{eqi) = {(level = zlevel + faucet — pump level yf _L faucet yf _L pump yf _L) , 
(level = _L faucet = _L pump = _L)} 

CE{eq 2 ) = {(alarm = ((0 >= level) or (level >= 13)) level yf _L alarm yf _L) , 
(level = _L alarm = _L)} 

For composition of equations, a naive computation of the CE function would 
yield an exponential number of constraint sets. Fortunately, this worst case com- 
plexity can be avoided by incrementally discarding constraint sets for which no 
solution exists. For the example, since a signal cannot be both present and ab- 
sent, the composition gets rid of 50% of the constraint sets originally obtained. 

4.3 Correctness 

The following theorem formalises how constraint sets serve to determine a safe 
approximation to the set of reachable memory states of a program. 

Theorem 1. Given program P = Eq init mg- Let C = CE{Eq) and let M C 
Mem. If M is a Sol{C) -invariant and mg € M then Reach(P) C M. 

Proof. The core of the proof is Lemma 1 which states that the set of constraint 
sets extracted from a program over-approximates the set of transitions that the 
program can make in a given memory state. As a consequence, all sets of memory 
states that are invariant under Sol{C) will be an invariant of the program. Thus, 
if a 5'oZ(C)-invariant M C Mem contains the initial state mg, an induction on 
the number of transitions shows that if mg — >■* to then to G M . 
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Lemma 1. Given set of equations Eq. Let Obsm = {A | Eq[Eq]m ■ A} and 
C = CE{Eq) then {m} x Obsm C Sol{C). 

Proof. For each equation, we consider any derivation path allowed by the stan- 
dard semantics. The constraints on values gathered along each derivation de- 
scribe Obsm (i.e the set of labels that can be deduced for a given memory). For 
example, the deducible derivations for the equation y := a default b are: 

£[g]m : (A, A(o)) £[6]^ : (A, A(6)) \{a) = u 

£[a default b]m ■ (A, u) X{y) = u 
Lq[y := a default b] : A 

£[o]^ : (A,A(g)) £[6]^ : (A, A(b)) A(a) = T 
£[a default b]m ■ (A, A(6)) X{y) = X{b) 

Eq[y := a default b] : X 

Since, by definition u G Value — {-L}, Obsm is given the intentional definition 

Obsm = {X I X{y) = A(a), A(a) _L} U {A | X{y) = A(6), A(a) = _L} 

This set is then proved to be a subset of the solutions to the set of constraint 
sets extracted from this equation. This proof scheme allows to prove Lemma 1 
for equations in isolation. Finally, for parallel composition, suppose that both 
Eqi and Eq 2 admit a label A in memory m. By induction hypothesis there exist 
{Ci £ CE(Eqi))i=i ^2 such that (m, A) belongs to a solution of both Ci and 
C 2 . Moreover, from the standard semantics (£q[Eqi || Eq 2 ]m '■ A) and from the 
constraint extraction function Ci U C 2 £ CE{Eqi || Eq 2 ). As a result, since 
(to. A) belongs to a solution of Ci U C 2 , it follows that Lemma 1 is verified. 

5 Fixed Point Computation 

The goal of this section is twofold. First, we provide a sufficient condition for 
5'o^(C)-invariance (Property 1). Based on this criterion, an over-approximation 
of the reachable memory states can be defined as the solution of a system of fixed 
point equations. Second, we abstract this system in the domain of convex poly- 
hedra to compute a solution (i.e a finite set of polyhedra that over-approximates 
the set of reachable memory states). 

5.1 Fixed Point Systems 

A constraint set C induces a symbolic transition function; this leads to another 
characterisation of 5'oZ(C')-invariance. 

Definition 5. Given a set of memories M C Mem and a constraint set C such 
that Sol{C) C State, we denote Trc{M) such that: 

Trc{M) = {to' : Vto G X) £ Sol{C).m' = tr{m,X)} 
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It follows that 5'oZ(C')-invariance can be reformulated by the following statement: 



Corollary 1. M is Sol{C) -invariant if and only if Trc{M) C M. 



Property 1 Let C he a constraint set, and let Cov = {i?i}r=i ® finite cover of 
M C Mem. If 

Vi? € Cov 3R' € Cov 

such that 

Trc{R) C R' 

then M is Sol{C) -invariant. 



The Property 1 gives a strategy for verifying 5'oZ(C)-invariance. As a result, 
Theorem 1 and this straightforward property characterise the invariants of pro- 
gram’s behaviour as post fixed points of the operator Tr(T) and can thereby be 
calculated by iteration. More precisely, it yields a family of fixed point systems 
parameterised by the cover. For example, consider the fixed point system to solve 
when the cover is reduced to a singleton: 



{M D Trc(M)}ceC U {M D mo} 

A more refined system can be built by associating each item of the cover to 
a constraint set in C and solve the set of inequalities 

{M D Mc}cscC{M D mo}U{Mc 3 Trc{mo)}c€cC{Mc 2 Trc{MD)}c,Dec 



5.2 Convex Approximation 

However, two problems have to be addressed: 

— the sets in IP(Mem) can be infinite. 

— there are infinite ascending chains in the lattice (lP(Mem),C). 

A standard solution to these problems is to restrict the sets under consider- 
ation to convex polyhedra [8]. This domain provides an interesting trade-off 
between precision and computable efficiency: it is precise since it models linear 
behaviours (as well as boolean as a special case) and efficient compared with in- 
teger programming methods. Moreover, convex polyhedra have a finite, symbolic 
description and widening operators can be defined to ensure the convergence of 
fixed point iteration. 

One inconvenience of using convex polyhedra is that non-linear constraints 
cannot be modelled accurately. In the present analysis we simply ignore any 
non-linear relation. This is safe but can lead to considerable loss of precision. 
Another inconvenience is that the accuracy of the analysis depends on the choice 
of the fixed point system to solve. Indeed, convex polyhedra are not closed by 
union which must be approximated by the convex hull operation. Due to this 
operation widely used by the fixed point iteration process, the precision depends 
on how reachable states are grouped into polyhedra. This problem is overcome 
by refining the system of fixed point equations according to Property 1. 
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5.3 Symbolic Transition Function 

To provide a computable symbolic transition function for a constraint set in C, 
we first normalise C. This transformation that preserves solutions splits each 
constraint set according to presence (resp. absence) of signals. As a result, any 
normalised constraint set is interpreted by a convex polyhedron stating con- 
straints on present signals and delay variables plus a list of absent signals. For 
such constraint sets, a symbolic transition function can be defined from the basic 
polyhedral operations of intersection and projection by iterating the following 
steps 

— Calculate the intersection of the polyhedra M and C. 

— Project this union onto the memory variables ZX for which the correspond- 
ing observable signal is present. 

— Add the newly found memory states to those already found. 

The first step of the transition consists in selecting the possible behaviours in 
C allowed by the memory states M. The second step consists in propagating 
the information stored in the obtained polyhedron to the next state by pro- 
jecting signals carrying new values on their corresponding delay variables. The 
analogy with the concrete semantics is straightforward: if a program signal x is 
constrained to _L, the memory is unchanged {zx projected on zx), otherwise, x 
carries the update of zx (x projected on zx). 



The Bathtub Example (Fixed Point) For this example, the constraints extraction 
algorithm yields 32 constraint sets that summarise any potential behaviour of 
the program. Among these, 20 sets raise alarm under given conditions on the 
memory states. The analysis will find that none of these conditions are met by 
any reachable state (i.e no reachable memory state raises alarm). 

The bathtub example does not require sophisticated fixed point iteration to 
check the property. Yet, we apply a general scheme that yields a trade-off between 
accuracy and efficiency. This strategy consists in gathering in a polyhedron Pq^ 
memory states reached by a constraint set Ci that does not raise the alarm 
whereas memory states that raise the alarm are gathered in a single polyhedron. 
For example, the constraint sets 



Cl 



C*2 



Cs 



/ level = zievel -|- faucet — pump faucet = zfaucet + 1 
y zfaucet < 0 zpump < 0 zievel <4 1 < level < 8 

/ level = zievel + faucet — pump faucet = zfaucet + 1 
I zfaucet > 1 zpump < 0 zievel <4 1 < level < 8 



pump = zpump 
alarm = false 



pump = zpump 
alarm = false 



/ level = zievel + faucet — pump faucet = zfaucet — 1 
I zfaucet > 1 zpump < 0 zievel > 7 level < 8 



pump = zpump + 1 
alarm = false 



lead to an iteration the first steps of which are 



Mo = 



/ zievel = 1 
zfaucet = 0 



zalarm = false \ 
zpump = 0 J 
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= TrcAMo) 
= TrcAM^J 
M^^=M^^UTrcAM^J 
M^c, = TrcAMhJ 



f zievel = 2 zalarm = false \ 
zfaucet = 1 zpump = 0 J 

f zievel = 4 zalarm = false \ 
zfaucet = 2 zpump = 0 J 

f zievel — Szfaucet + 2 = 0 zalarm = false 
2 < zfaucet < 3 zpump = 0 

f zievel = 8 zalarm = false \ 
zfaucet = 2 zpump = 1 / 



6 Convex Hull Based Widening 

Convex polyhedra have two dual representations. The representation most fre- 
quently used in program analysis is as solutions of sets of linear constraints. 

m 

P = Sol{{^^aij ■ Xj > where aij,bj € Z 

J=i 

Another representation is in terms of convex hull of a set of vertices, extended 
with a listing of the directions in which the polyhedron extends infinitely: 

Definition 6. A vertex of a convex polyhedron P is any point in P that cannot 
he expressed as a convex combination of other distinct points in P. 

Definition 7. A ray of a convex polyhedron P is a vector r, such that x G P 
implies (x + p,r) G P for all pi > 0. A ray of a convex polyhedron P is said to 
he extreme if and only if it cannot be expressed as a positive combination of any 
other two distinct rays of P. The set of extreme rays form a basis which describes 
all directions in which the convex polyhedron is open. 

Definition 8. A line (or bidirectional ray) of a polyhedron P is a vector I, such 
that X G P implies (x + pi) G P for all p G Q. 

Theorem 2. Every polyhedron P can be written as follows: 

<7 P 5 

P = {x\x = • s,) + '^{Pj ■ rj) + ■ dk)}, 

i—1 j—1 k—1 

where 0 < Ai < l,X)r=i(Ai) = IjO < pj and Si G vertices, rj G rays,du G lines. 

A minimal normalised representation can be exhibited for both representations 
[17]. This property is essential for defining a widening operator. 

Widening issues for convex polyhedra were first investigated by Cousot and 
Halbwachs [8,10]. Their widening strategy is based on cardinality of the con- 
straint form: after a bounded number of iterations, the minimal number of lin- 
ear constraints needed to represent a polyhedron must strictly decrease by each 
iteration. Since this number is finite, the convergence is ensured. The widening 
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operator derived from this strategy only keeps constraints that were invariant 
by the previous iteration step. We will highlight the weakness of this widening 
and present an improved widening operator. 

First, the dimension of a polyhedron is not abstracted correctly by the num- 
ber of constraints. According to the standard widening strategy, the widening of a 
square by a cube leads to a semi-infinite square section. Our strategy accepts the 
initial cube as the result of the widening. Furthermore, intuitively, closed poly- 
hedra are smaller than open ones. Our strategy will formally take into account 
this fact. As another weakness, consider the following iteration that describes 
a fixed point iteration involving a triangle. The infinite computation leads to a 




Fig. 2. Limit out of the scope of widening strategy 



solution (a half-band) described by the same number of constraint than the ini- 
tial triangle. The standard widening strategy cannot produce this limit whereas 
ours does while ensuring convergence. 

6.1 Convex Hull Based Widening 

The standard widening strategy uses the constraint representation of polyhedra; 
we propose an alternative relying on the convex hull representation. Whereas the 
first representation is abstracted by one parameter (the number of constraints), 
the latter is abstracted by four parameters: the dimension, the number of ver- 
tices, extreme rays and lines. Examples argue that these parameters give a more 
precise description than the number of constraints. Moreover, we establish that 
the following widening strategy respects the finite ascending chain property. Let 
V = I vertices |, r = | extreme rays |, ^ = | lines \ and d the polyhedron dimen- 
sion. Let id = r + 2 ■ I the number of semi-infinite directions. 

Theorem 3. Let Pq C Pi ... C P„ C .. . be an ascending chain of polyhedra . 
If for all i in the chain one of the following statement holds 

- dp^ < dpi+i 

^ ^Pi ~ dPi+i I^idp^ < idp.^^^ 

— dpi = dp^^^ Aidp^ = A up. > up.^j 

then the ascending chain stabilises (3n\/i > n.Pi = Pi+i) 
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We propose two widening techniques for polyhedra respecting this new widen- 
ing strategy: decrease of the number of vertices, increase of the number of ex- 
treme rays. In the following, P and Q denote two polyhedra for which we intend 
to compute the widening polyhedron R = PvQ. 

One technique consists in reducing the number of vertices by selecting a ver- 
tex belonging to the convex hull of both P and Q. If it is satisfied by constraints 
that evolved since the previous iteration then these constraints are replaced by 
the constraint obtained by normed linear combination of these constraints. This 
transformation can be interpreted like a projection along a suitable direction. 
Typically, it is relevant to apply this heuristics for the example of Fig. 3. 




Fig. 3. An inhnite band rather than an half-space 



The second technique uses the fact that convergence is ensured if the number 
of extreme rays is at least increased by one. As a consequence, an added ray R 
must be an extreme ray that does not make redundant any existing extreme 
ray. Formally, R must be a solution of the following system where rays includes 
extreme rays and lines (bidirectional rays) . 

J Vu, w £ rays, ^>0,X-v + ^- w = R 
\ Vu, w £ rays, $\, ^>0,X-v + ^-R = w 

Among these solutions, a good direction for this new ray is chosen by two heuris- 
tics. The first one assums that the polyhedral center follows a linear trajectory 
defined by the vector it and compute the ray closer to this direction. It amounts 
to maximise it ■ R under the previous constraints. The second makes a hypoth- 
esis similar to the standard widening and give conditions so that the additional 
ray does not weaken constraints invariant by the iteration step. Let C be this 
set. Formally these conditions are expressed by the following system: 



tc£C,R-v±>0 

where Vcj^ is the vector orthogonal to c. 

7 Related Work 

Semantics Previous works [16] showed how denotational semantics can be used 
as a foundation for deriving clock analyses for data-flow synchronous languages 
(Lustre and Signal) by abstract interpretation. In this paper we have shown 
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how a small-step operational semantics can serve to prove correctness of data- 
flow analyses for such languages. For this, we have defined an operational seman- 
tics for Signal that differs from the existing operational semantics of Signal 
[3] and Esterel [4] in the way that delay variables are handled. The exist- 
ing semantics rewrite the program such that the value to be memorised appears 
syntactically in the program, thus directly incorporating states into the program 
syntax. Rather than using rewrite semantics, our operational framework main- 
tains an explicit distinction between the system of equations, the state and its 
instantaneous valuation. In this respect it is closer to usual operational seman- 
tics for imperative languages with state. This proximity makes it in our opinion 
simpler to adapt existing static analysis techniques to this new paradigm. The 
semantics is defined for the Signal language but we believe that it should be 
easy to modify in order to model Lustre and Esterel. 

Analysis of Synchronous Programs The notion of synchronous observer provides 
a means of expressing safety properties in the programming language itself. A 
synchronous observer is a program composed with the system to verify. It does 
not influence the system but spies its behaviour and emits an alarm if the desired 
property is not satisfied. Verifying that such an alarm will never be emitted 
consists in reachability analysis. This methodology is applied to express and 
verify boolean safety properties of synchronous Lustre programs [13,14,12]. 
Under these conditions, the effective computation of reachable states is done by 
model checking techniques. Our approach extends this to integer-valued signals. 

Polyhedra- Based Analyses In the framework of abstract interpretation [7], linear 
constraint analysis was first defined for an imperative language [8]. It associates 
to each program point a polyhedron that safely approximates the set of mem- 
ory states that can reach that point. First, the analysis derives a system of 
equations that describes safely in terms of polyhedra the meaning of each im- 
perative construct. Second, this system is solved by fixed point iteration with a 
widening operator to ensure the convergence. We have shown how to apply this 
analysis strategy to the synchronous programming paradigm. Analyses based on 
convex polyhedra have been applied to linear hybrid automata: an extension of 
finite-state machine that models time requirements. A configuration of such an 
automaton consists of a control location and a clock valuation. Clocks evolve 
linearly with time in a control location and can be assigned linear expressions 
when a transition, guarded by linear constraints, occurs. Halbwachs et al. adapt 
the analysis of [8] to deal with a class of linear hybrid automata and approx- 
imate the reachable configurations of any location by a polyhedron [15]. The 
time elapse is modelled by a polyhedron transformation. Following the previous 
method, a system of equations is derived and solved by fixed point iteration. 
For a restricted class of linear hybrid automata, the timed automata, the model 
checking of TCTL formula is proved decidable [1]. This is used in the Kronos 
tool [9]. Apart from our time model being discrete the main difference is that 
we handle arbitrary linear assignments and guards. The price for this general- 
ity is that we in general calculate an over-approximation of the real answer. In 




Polyhedral Analysis for Synchronous Languages 



67 



synchronous programming, linear relation analysis is also applied to approxi- 
mate the behaviour of delay counters [10]. The polyhedral equations are derived 
for the interpreted automaton produced by the Esterel compiler. In practice, 
it allows to check properties and to remove unreachable states and transitions 
from the interpreted automaton. We use the same technology based on polyhe- 
dra but propose a new widening operator that can ensure convergence where 
the standard widening based on decreasing the number of constraints would fail. 
Furthermore, our framework allows us to prove correctness of the analysis with 
respect to the original program semantics — this is to our knowledge the first 
time that this has been done for synchronous programs. 

8 Conclusions 

We have presented a semantics-based static analysis for determining linear rela- 
tions between variables in Signal programs via a fixed point calculation with 
widening in the domain of convex polyhedra. The paper contributes with: 

— A simple, state-based operational semantics of Signal that clearly separates 
the program’s syntax and the transition system that models it. 

— A constraint-based analysis that produces a system of equations whose so- 
lution is the property analysed for. This analysis is proved correct wrt. the 
operational semantics. 

— A novel widening operator for the domain of polyhedra based on their 
convex-hull representation. This widening operator ensures convergence where 
widening based on reducing the number of linear constraints fails. 

A prototype implementation of the analysis has allowed preliminary experiments 
on Signal programs up to approximately 60 equations. The analyser is imple- 
mented with the polyhedra library produced by the API project at IRISA^ and is 
interfaced with a generic fixed point solver developed by IRISA’s Lande project. 
Acknowledgments: Thanks are due to Mirabelle Nebut for extensive com- 
ments on an earlier version of the paper. 



A Translation 



We present a simplified translation scheme by a set of rewrite rules to apply to 
equations until they belong to the restricted form. 



X := /(ei, . . . ,e„) 


^ Xi 


:= ei 


... 


1 Xn 


:= e„ 1 X := f{xi ,. . . ,r„) 


X := ei when 62 


^ Xi 


:= ei 


X2 


•■ = 62 


X : = Xi when X 2 


X : = ei default 62 


^ Xi 


:= ei 


X2 


:= 62 


X : = Xi default X 2 


synchro ei 62 


^ Xi 


:= ei 


X2 


:= 62 


ti \ = Xi= Xi 1 t2 := X2 = X2 1 S ■■ = h =t2 



See http://www.irisa.fr/API 
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Abstract. Complementation, the inverse of the reduced product opera- 
tion, is a relatively new technique for systematically finding minimal de- 
compositions of abstract domains. File and Ranzato advanced the state 
of the art by introducing a simple method for computing a complement. 
As an application, they considered the extraction by complementation 
of the pair-sharing domain PS from the Jacobs and Langen’s set-sharing 
domain SH . However, since the result of this operation was still SH , 
they concluded that PS was too abstract for this. Here, we show that 
the source of this difficulty lies not with PS but with SPI and, more 
precisely, with the redundant information contained in SH with respect 
to ground-dependencies and pair-sharing. In fact, the difficulties vanish 
if our non-redundant version of SH , SH'’ , is substituted for SH . To es- 
tablish the results for SH'’ , we define a general schema for subdomains 
of SH that includes SH'’ and Def as special cases. This sheds new light 
on the structure of SH'’ and exposes a natural though unexpected con- 
nection between Def and SH'’ . Moreover, we substantiate the claim that 
complementation alone is not sufficient to obtain truly minimal decom- 
positions of domains. The right solution to this problem is to first remove 
redundancies by computing the quotient of the domain with respect to 
the observable behavior, and only then decompose it by complementa- 
tion. 

Keywords: Abstract Interpretation, Domain Decomposition, Comple- 
mentation, Sharing Analysis. 



1 Introduction 

Complementation [5], which is the inverse of the well-known reduced product 
operation [7], can systematically obtain minimal decompositions of complex ab- 
stract domains. It was argued that these decompositions would be useful in find- 
ing space saving representations for domains and to simplify domain verification 
problems. 

* Part of this work was supported by EPSRC grant GR/M05645. 
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In [8], File and Ranzato presented a new method for computing the comple- 
ment, which is simpler than the original proposal by Cortesi et al. [4,5] because 
it has the advantage that, in order to compute the complement, only a relatively 
small number of elements (namely the meet-irreducible elements of the refer- 
ence domain) need be considered. As an application of this method, the authors 
considered the Jacobs and Langen’s sharing domain [13], SH, for representing 
properties of variables such as groundness and sharing. This domain captures 
the property of set-sharing. However, it has been observed [1] that for most (if 
not all) applications, the property of interest is not set-sharing but pair-sharing. 
File and Ranzato illustrated their method by minimally decomposing SH into 
three components; using the words of the authors [8, Section 1]: 

“each representing one of the elementary properties that coexist in the 
elements of Sharing, and that are as follows: (i) the ground-dependency 
information; (ii) the pair-sharing information, or equivalently variable 
independence; (iii) the set-sharing information, without variable inde- 
pendence and ground-dependency.” 

However, this decomposition did not use the usual domain PS for pair-sharing. 
File and Ranzato observed that the complement of the pair-sharing domain 
PS with respect to SH is again SH and concluded that PS was too abstract 
to be extracted from SH by means of complementation. In order to overcome 
this difficulty, and to obtain this non-trivial decomposition of SH , they used 
a different (and somewhat unnatural) definition for an alternative pair-sharing 
domain that they called PS' . The nature of PS' and its connection with PS is 
examined more carefully in Section 6. 

We noticed that the difficulty File and Ranzato had was not in the definition 
of PS , which accurately represents the property of pair-sharing, but in the use of 
the set-sharing domain SH itself since it carries some redundant information. It 
was shown in [I] that, for groundness and pair-sharing, SH includes redundant 
information. By defining an upper closure operator p that removed this redun- 
dancy, a much smaller domain SH'^ was found that captured pair-sharing with 
the same precision as SH . We show here that using the method given in [8], 
but with this domain instead of SH as the reference domain, the difficulties in 
the decomposition disappear. Moreover, we show that PS is exactly one of the 
components obtained by complementation of SH'^ . Thus the problem exposed 
by File and Ranzato was, in fact, due to the “information preserving” property 
of complementation, as any factorization obtained in this way is such that the 
reduced product of the factors gives back the original domain. In particular, any 
factorization of SH has to encode the redundant information identified in [1]. 
We will show that such a problem disappears when SH'^ is used as the reference 
domain. 

Although the primary purpose of this work is to clarify the decomposition of 
the domain SH'^ , the formulation is sufficiently general to apply to other proper- 
ties that are captured by SH . The domain Pos of positive Boolean functions and 
its subdomain Def , the domain of definite Boolean functions, are normally used 
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for capturing groundness. Each Boolean variable has the value true if the pro- 
gram variable it corresponds to is definitely bound to a ground term. However, 
the domain Pos is isomorphic to SH via the mapping from formulas in Pos to 
the set of complements of their models [3]. This means that any general results 
regarding the structure of SH are equally applicable to Pos and its subdomains. 

To establish the results for SH^, we define a general schema for subdomains 
of SH that includes SH^ and Def as special cases. This sheds new light on the 
structure of the domain SH^, which is smaller but significantly more involved 
than SH Of course, as we have used the more general schematic approach, 
we can immediately derive (where applicable) corresponding results for Def and 
Pos. Moreover, an interesting consequence of this work is the discovery of a 
natural connection between the abstract domains Def and SH^. The results 
confirm that SH^ is, in fact, the “appropriate” domain to consider when pair- 
sharing is the property of interest. 

The paper is structured as follows: In Section 2 we briefly recall the required 
notions and notations, even though we assume general acquaintance with the 
topics of lattice theory, abstract interpretation, sharing analysis and groundness 
analysis. Section 3 introduces the SH domain and several abstractions of it. 
The meet-irreducible elements of an important family of abstractions of SH 
are identified in Section 4. This is required in order to apply, in Section 5, the 
method of File and Ranzato to this family. We conclude in Section 6 with some 
final remark where, in particular, we explain what is, in our opinion, the lesson 
to be learned from this and other related works. 

2 Preliminaries 

For any set S, p{S) denotes the power set of S and # S' is the cardinality of S. 

A preorder ^ over a set P is a binary relation that is reflexive and transitive. 
If ^ is also antisymmetric, then it is called partial order. A set P equipped with 
a partial order ^ is said to be partially ordered and sometimes written (P, ^). 
Partially ordered sets are also called posets. 

Given a poset (P, and S C P, y £ P is an upper hound for S if and only 
if X ^ y for each x € S. An upper bound y for S is the least upper hound (or 
lub) of S if and only if, for every upper bound y' for S, y < y' . The lub, when 
it exists, is unique. In this case we write y = lubS. Lower hounds and greatest 
lower hounds are defined dually. 

A poset (A, such that, for each x,y G L, both lub{x,y} and glb{x,y} 
exist, is called a lattice. In this case, lub and gib are also called, respectively, 
the join and the meet operations of the lattice. A complete lattice is a lattice 
(L, such that every subset of L has both a least upper bound and a greatest 
lower bound. The top element of a complete lattice L, denoted by T, is such 
that T £ A and Vx £ A : x ^ T. The bottom element of A, denoted by T, is 
defined dually. 

^ For the well acquainted with the matter: SH is a powerset and hence it is dual- 
atomistic; this is not the case for SH^ . 
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An algebra (L, A,V) is also called a lattice if A and V are two binary op- 
erations over L that are commutative, associative, idempotent, and satisfy the 
following absorption laws, for each x,y € L: x A {xV y) = x and xV {x Ay) = x. 
The two definitions of lattices are equivalent. This can be seen by setting up the 
isomorphism given by: x < y <4^ x Ay = x ^4=4 xM y = y, glb{x, y} a; A y, 
and lub{x, y} a; V y. A complete lattice C is meet- continuous if for any chain 
Y C C and each x € C, x A {^Y) = Vygy(a^ v)- 

A monotone and idempotent self-map p: P ^ P over a poset {P, :<) is called 
a closure operator (or upper closure operator) if it is also extensive, namely 
\/x & P : X < p{x). If C is a complete lattice, then each upper closure operator 
p over C is uniquely determined by the set of its fixpoints, that is, by its image 
p{C) = { p{x) I a; € C }. The set of all upper closure operators over a complete 
lattice C, denoted by uco(C), form a complete lattice ordered as follows: if 
Pi,P 2 G uco(P), Pi C p 2 if and only if p 2 {C) C pi{C). We will often denote 
upper closure operators by the sets of their fixpoints. The reader is referred to 
[11] for an extensive treatment of closure operators. 

The reduced product of two elements P\,P 2 of uco(C) is: pirip 2 glb{pi, ^ 2 }- 

Suppose C is a meet-continuous lattice (which is the case for most domains for 
abstract interpretation [5], here included all the domains considered in this pa- 
per) . Then the inverse of the reduced product operation, called complementation, 

is well defined and given as follows. Y p Q pi then p ~ pi lub{ P 2 \ Pi ^ P 2 = 
p}. Given a meet-continuous lattice C and p € uco(G), the pseudo -complement 
(or, by an abuse of terminology now customary in the field, simply complement) 
of p is denoted by idc ~ p, where idc is the identity over C. Let G be a meet- 

def 

continuous lattice and Di = pOiiC) with p/j. G uco(G) for i = 1, . . . , n. Then 
{Di I l<i<n}isa decomposition for C Y C = DiU ■ ■ ■ U Dn- The decom- 
position is also called minimal if, for each k G N with 1 < A: < n and each 
Ek G uco(G), Dk C Ek implies G C Di □ • • • FI Dk-i □ □ Dk+i □ • • • □ G„. 

Let G be a complete lattice and X C C. Then Moore(A) denotes the Moore 

completion of X, namely, Moore(A) =^{/\y|yCA }. We say that G is 
meet-generated by X Y C = Moore(A). An element a: G G is meet-irreducible if 
Vy, z G C : [{x = y A z) => {x = y or x = z)). The set of meet-irreducible ele- 
ments of a complete lattice G is denoted by MI(G). Note that T G MI(G). An ele- 
ment a; G G is a dual-atom if a: T and, for each y G C , x < y <Y implies x = y. 

The set of dual-atoms is denoted by dAtoms(G). Note that dAtoms(G) C MI(G). 
The domain G is dual-atomistic if G = Moore (dAtoms(G)) . Thus, if G is dual- 
atomistic, MI(G) = {T} UdAtoms(G). 

The following result holds [8, Theorem 4.1]. 

Theorem 1. If C is meet- generated by MI(G) then uco(G) is pseudo-comple- 
mented and for any p G uco(G) 

idc ^ P = Moore (MI(G) \p(G)). 

Another useful result from lattice theory is the following. 
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Theorem 2. All continuous lattices are meet-generated by their meet-irreducible 
elements. 

Hence we have the following corollary [8, Corollary 4.5]. 

Corollary 1. If C is dual- atomistic then uco(C') is pseudo-complemented and 
for any p € uco(C) 



idc ~ p = Moore (d Atoms (C) \ p(C')). 

Domains such as SH are normally defined over a denumerable set of variables 
Vars and then a finite subset of Vars is used to define a subset of SH that is 
restricted to these variables. In this paper, we assume there is a fixed and finite 
set of variables of interest VI C Vars of cardinality n. 

If t is a first-order term over VI, then varsft) denotes the set of variables 
occurring in t. Bind denotes the set of equations of the form x = t (sometimes 
written x t) where x G VI and t is a first-order term over VI. Note that we 
do not impose the occur-check condition x (f. varsft), since we have proved in 
[12] that this is not required to ensure correctness of the operations of SH and 

its derivatives. We also define Subst p{Bind). 



3 The Sharing Domains 

3.1 The Set- Sharing Domain SH 

In this paper, since the set of relevant variables is fixed, it is assumed that 
SH, Def and PS are each restricted to the finite set of variables VI C Vars. 
Therefore, the domain elements for SH (and hence for all their subdomains such 
as PS and Def) do not include the set of variables explicitly. 

Definition 1. (The set-sharing domain SH.) The domain SH is given by 

SH p{SG) where SG { S' G p{VI) | S yf 0}. SH is partially ordered by 
set inclusion so that the lub is given by set union and gib by set intersection. 

Note that, as we are adopting the upper closure operator approach to ab- 
stract interpretation, all the domains we define here are ordered by subset 
inclusion. In the following examples, the elements of SH will be always writ- 
ten in a simplified notation, omitting the inner braces. For instance, the set 
{{x}, {a;, y}, {x, z}, {x, y, z}} will be written simply as {x, xy, xz, xyz}. 

For the purpose of this paper, we just need the following operations over SH . 
See [1] for a precise definition of all the operations used for an analysis. 

Definition 2. (Some operations over SH.) The function bin: SH x SH -G 
SH , called binary union, is given by 

bin(s/ii, s/ 12 ) U S '2 1 S'! G shi, S 2 G s /12 }. 
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The star-union function (•)* : SH — >■ SH is given by 

sh* I S' G SG 3sh' C sh . S =[jsh'y 

For each j > 1, the j-self-union function {-y : SH — >■ SH is given by 

sy I S G SG 3sh' Gsh .{if sh' <j,S = \J sh'^ }. 

The extraction of the relevant component of an element of SH with respect to a 
subset of VI is encoded by the function rel: pf{VI) x SH — >■ SH given by 

rel(y, s/i) { S G s/i I S n R 0 }. 

The function amgu captures the effects of a binding x ^ t on an element of SH . 
Let Vx = {x}, Vt = varsft) and v^t = VxGvt- Then 

amgu(s/i, a; I— >■ t) (s/i \ (rel(ua;t, s/i)) U bin(rel(ua;, sh)*, rel(ut, s/i)*) . 

We also define the extension amgu : SH x Subst — >■ SH by 

amgu(s/i, 0 ) sh, 

amgu(s/i, {x I— >■ t} U a) amgu(amgu(s/i, x i— >■ t), cr \ {x i-G- t}) . 

The function proj: SH x p{VI) — >■ SH that projects an element of SH onto a 
subset V C VI of the variables of interest is given by 

proj(s/i, y) { s n y I s G s/i, s n R 0 }. 

The j-self-union operator is new. We show later when it may be safely used to 
replace the star-union operator. Note, in particular that, letting j = 1, 2, and n, 
we have sh^ = sh, sh^ = bin(s/i, sh), and, as ff VI = n, sh'' = sh* . 

Since SH is a power set, SH is dual-atomistic and 

dAtoms(Si7) = { SG \ {S} | S G SG }. 

Example 1. Suppose VI = {x,y,z\. Then the seven dual-atoms of SH are: 

51 = { y, xy, xz, yz, xyz}, ) 

5 2 = {x, z,xy,xz,yz,xyz}, > these lack a singleton; 

5 3 = {x, y, xy, xz, yz, xyz}, ) 

5 4 = {x,y,z, xz,yz,xyz},' 

5 5 = {x,y, z,xy, yz,xyz}, these lack a pair; 

36 = {x,y,z,xy,xz, xyz}, ^ 

Si = {x,y, z,xy,xz,yz }, this lacks VI. 

Then the meet-irreducible elements of SH are si, . . . ,57 together with SG, the 
top element. 
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3.2 The Tuple-Sharing Domains 

To provide a general characterization of domains such as the groundness and 
pair-sharing domains in SH , we first identify the sets of elements that have the 
same cardinality. 

Definition 3. (Tuples of cardinality k.) For each k G N with 1 < k < n, the 

overloaded functions tuples^ : SG — >■ SH and tuples^. : SH -G- SH are defined as 

tuples,{S)'^^' {T&p{S)\#T = k}, 
tupleSf.{sh) [J{ tuplesj,(S'') \ S' G sh }. 

In particular, if S G SG, sh G SH , let 

pairs(5) tuples 2 (S'), 
pairs(s/i) tuples 2 (s/i). 

The usual domains that represent groundness and pair-sharing information 
will be shown to be special cases of the following more general domain. 

Definition 4. (The tuple-sharing domains TSk-) For each k G N such that 
1 < k < n, the function p^s^. ■ SH -G SH is defined as 

PTSt,{sh) { S' G SG I tuplesj,(S) C tuples^.(s/i) } 

and, as prs^. G VlCo(SH), it induces the lattice 

TSk = Prs,{SH). 

Note that prs*. (tuples ^.(s/i)) = pTs,.{sh) and that there is a one to one corre- 
spondence between TSk and p (tuples ^.( IT)) . The isomorphism is given by the 
functions tuples^.: TSk — >■ p(tuplesj,( IT)) and Prs*. : P (tuples ;.( IT)) — >• TSk- 
Thus the domain TSk is the smallest domain that can represent properties char- 
acterized by sets of variables of cardinality k. We now consider the tuple-sharing 
domains for the cases when fc = 1, 2, and n. 

Definition 5. (The groundness domain Gon.) The upper closure operator 
Pcon ■ SH -G SH and the corresponding domain Gon are defined as pc„n Pts^ 
and Gon = TSi = pcon{SH). 

This domain, which represents groundness information, is isomorphic to the 
domain of conjunctions of Boolean variables. The isomorphism tuplesj^ maps 
each element of Gon to the set of variables that are possibly non-ground. The 
usual domain (also normally called Gon) for representing groundness can be 
obtained (as for Pos and Def) by set complementation. 
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Definition 6. (The pair-sharing domain PS.) The upper closure operator 

dsf 

Pps : SH — SH and the corresponding domain PS are defined as pps = Prs^ 

PS TS 2 = Pps{SH). 

This domain represents pair-sharing information and the isomorphism tuples 2 
maps each element of PS to the set of pairs of variables that may be bound 
to terms that share a common variable. The domain for representing variable 
independence can be obtained by set complementation. Finally, in the case when 
k = n we have a domain consisting of just two elements: 

T5„ = {5'G',5G\{F/}}. 

Just as for SH , the domains TSk are dual-atomistic and: 

dAtoms(T5'fc) = { (^G \ { [/ G I T C [/}) T G tupleSfc(F/) }. 

Thus we have 

dAtoms(Gon) = | (b'G \ { G G b'G | a; G G }) a; G F/ }, 
dAtoms(P5') = I (b'G \ { G G b'G I a;, y G G }) x^yG VI Y 

Example 2. Consider Example 1. Then the dual atoms of Con are 

{ 2 /, yz}, 

{x, z, xz }, 

{x,y, xy }, 

and the dual atoms of PS are 

{x,y,z, xz,yz}, 

{x,y,z,xy, yz}, 

{x,y,z,xy,xz }. 

It can be seen from the dual atoms, that the precision of the information 
encoded by domains TSj and TSk is not comparable when j yf k. Also, we note 
that, if j < k, then PrSjiTSk) = {b'G} and prs^iTSj) = TSj. 

3.3 The Tuple-Sharing Dependency Domains 

We now need to define domains that capture the propagation of groundness and 
pair-sharing; in particular, the dependency of these properties on the further 
instantiation of the variables. In the same way as with TSk for Con and PS, we 
first define a general subdomain TSDk of SH . This must be safe with respect 
to the tuple-sharing property represented by TSk when performing the usual 
abstract operations. This was the motivation behind the introduction in [1] of 
the pair-sharing dependency domain SH^ . We now generalize this for tuple- 
sharing. 
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Definition 7. The tuple-sharing dependency domain {TSDj~.) For each 
k where 1 < k < n, the function Ptsd^. ■ SH — >■ SH is defined as 

PrsDf. {sh) 

=^|s'g5'G' \/TCS:#T<k ^ S' = 1J{ ^ ^ s/i | T C C/ C S' } }, 
and, as Ptsd^, G VlCo(SH), it induces the tuple-sharing dependency lattice 

TSDu = PrsD,{SH). 

It follows from the definitions that the domains TSDk form a strict chain. 
Proposition 1. For j,k with 1 < j < k < n, we have TSDj C TSDk- 
Moreover, TSDk is not less precise than TSk- 

Proposition 2. For A: G N with 1 < k < n, we have TSk ^ TSDk- Further- 
more, ifn>l then TSk C TSDk- 

A consequence of Propositions 1 and 2 is that TSDk is not less precise than 
TSi n • • • n TSk- 

Corollary 2. For j, /c G N with 1 < j < fc < n, we have TSj C TSDk- 

It also follows from the definitions that, for the TSDk domain, the star-union 
operator can be replaced by the /c-self-union operator. 

Proposition 3. For 1 < k < n, we have PrsD^^sh^) = sh* . 

We consider the tuple-sharing dependency domains for the cases when k = 1, 
2, and n. 

Definition 8. (The ground dependency domain Def.) The upper closure 
operator poef '- SH — >■ SH and the corresponding domain Def are defined as 

Poef Ptsdi and Def TSDi = p^efiSH)- 

By Proposition 3, we have for all sh G SH, prsofish) = sh* so that TSDi is a 
representation of the domain Def used for capturing groundness. It also confirms 
the well-known result that the star-union operator is redundant for elements in 
Def- 

Definition 9. (The pair-sharing dependency domain PSD.) The upper 
closure operator ppso '- SH SH and the corresponding domain PSD are defined 
as Ppsd Ptsd 2 and PSD TSD 2 = Ppsd{SH)- 

Then, it follows from [1, Theorem 7] that PSD is in fact the domain, corre- 
sponding to SH'^ , defined for capturing pair-sharing. By Proposition 3, we have 
for all sh G SH , ppsp^sh^) = sh*-, thus confirming the result in [1] that for ele- 
ments in PSD, the star-union operator sh* can be replaced by the 2-self-union 
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sh^ = bin(s/i, sh) without any loss of precision. Furthermore, Corollary 2 con- 
firms the result established in [1] that PSD also captures groundness. Finally, 
letting k = n, we observe that TSDn = SH . 

In [1], the PSD domain is shown to be as good as SH for both representing 
and propagating pair-sharing. It is also proved that any weaker domain does not 
satisfy these properties, so that PSD is the quotient [6] of SH with respect to 
the pair-sharing property PS. In the view of recent results on abstract domain 
completeness [9], this also means that PSD is the least fully- complete extension 
(Ifce) of PS with respect to SH. 

From a purely theoretical point of view, the quotient of an abstract interpreta- 
tion with respect to a property of interest and the least fully-complete extension 
of an upper closure operator are not equivalent. It is known [6] that the quotient 
may not exist, while the Ifce is always defined. However, it is also known [10] that 
when the quotient exists it is exactly the same as the Ifce. Moreover, it should 
be noted that the quotient will exist as long as we consider a semantics where at 
least one of the domain operators is additive and this is almost always the case 
(just consider the merge-over-all-paths operator, usually implemented as the lub 
of the domain). Therefore, for the domains considered here, these two approaches 
to the completeness problem in abstract interpretation are equivalent. 

We now generalize and strengthen the result in [1] and show that, for each 
k £ {1, ... ,n}, TSDk is the quotient of SH with respect to the reduced product 
TOi n • • • n TSk (equivalently, the Ifce of TOi □ • • • □ TSk with respect to SH). 

The following results can be obtained by generalizing the corresponding re- 
sults in [1]. 

Theorem 3. Let s/ii,s/i 2 G SH and 1 < k < n. If PTSD^.{shi) = pTSD^{sh 2 ) 
then, for each a G Subst, each sh G SH , and each V G pi{VI), 

PraDj,(amgu(s/ii,cr)) = (amgu(s/i 2 ,cr)), 

PrsD^sh' U s/ll) = pTSD,,{sh' U s/ 12 ), 

PrsDj, (proj(s/ii, M)) = pTODfc (proj(s/i 2 , H)). 

Theorem 4. Let \ < k < n For each s/ii,s/i 2 G SH , PrsD^ishi) ^ PTso,^{sh 2 ) 
implies 

3a G Subst, 3j G {1, . . . ,/c} . Prsj (amgu(s/ii, cr)) pj-g. (amgu(s/i 2 , cr)) . 

4 The Meet-Irreducible Elements 

In Section 5, we use the method of File and Ranzato [8] to decompose the 
dependency domains TSDk. In preparation for this, in this section, we identify 
the meet-irreducible elements for the domains and state some general results. 

We have already observed that TSk and TSDn = SH are dual-atomistic. 
However, TSDk, for k < n, is not dual-atomistic and we need to identify the 
meet-irreducible elements. In fact, the set of dual-atoms for TSDk is 

dAtoms(T5'T>fe) = {SG\{S} \ S £ SG,ff S <k}. 
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Note that # d Atoms (TOD j.) = (])■ Specializing this for fc = 1 and k = 2, 

respectively, we have: 

dAtoms(De/) = { SG \ {{a:}} \ x € VI 

dAtoms(PS'Il) = { SG \ {S'} | S G pairs( VI) } U dAtoms(De/), 

and that #dAtoms(De/) = n and #dAtoms(PSD) = n(n + l)/2. We present 
as an example of this the dual atoms for Def and PSD when n = 3. 

Example 3. Consider Example 1. Then the 3 dual-atoms for Def are si,S2,S3 
and the 6 dual atoms for PSD are si, . . . ,55. Note that these are not all the meet- 
irreducible elements since sets that do not contain xyz such as _L = p£,e/(-L) = 0 
and {x} cannot be obtained by the meet (which is set intersection) of a set of 
dual-atoms. Thus, unlike Gon and PS , neither Def nor PSD are dual-atomistic. 

Consider next the subset Mk of meet-irreducible elements of TSDk that are 
neither the top element SG nor dual atoms. has an element for each sharing 
group S G SG such that ff S > k and each tuple T C S with ffT=k. Such 
an element is obtained from SG hy removing all sharing groups U such that 
T C U C S. Formally, 

Mk = {SG\{UGSG\TCUCS}\T,SGSG,TcS,#T = k}. 

Note that, as there are (^) possible choices for T and 2"“^ — 1 possible choices 

for we have # Mfc = (/)(2"-'= - 1) and #Ml{TSDk) = E.to Q) + (fe)2"-^ 
Again, we illustrate this definition with the case when n = 3. 

Example 4- Consider again Example 3. First, consider the domain Def. The 
meet-irreducible elements which are not dual-atoms, beside SG, are the follow- 
ing: 

qi = { y, z, xz, yz, xyz} C si, 

92 = 1 y,z,xy, yz,xyz]Csi, ri = { y,z, yz}dqiGq2, 

53 = (x, z, xz,yz,xyz} G S2, 

54 = (a;, z,xy,xz, xyz} C S2, T2 = {x, z, xz }Cq3nq4, 

q5 = {x,y, xy, yz,xyz}Cs3, 

qe = {x,y, xy,xz, xyz} C S3, r3 = {x,y, xy }Cq5r\qe. 

Next, consider the domain PSD. The only meet-irreducible elements that are 
not dual-atoms, beside SG, are the following: 

mi = {x,y,z, xz,yz } C S4 

m2 = {x,y,z,xy, yz } C S5 

m3 = {x,y,z,xy,xz } C 55. 

Each of these lack a pair and none contains the sharing group xyz. 
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We now show that we have identified precisely all the meet-irreducible ele- 
ments of TSDk- 

Theorem 5. // A: G N with 1 < k < n, then 

Ml{TSDk) = {b’G} U dAtoms(T5'L>fc) U Mfc. 

As a consequence, we have the following result. 

Corollary 3. Let k gN with I < k < n. Then 

dAtoms(T5'fc) = {shG Ml{TSDk) \ VI ^ sh }. 

For the decomposition, we need to identify which meet-irreducible elements 
of TSDk are in TSj. Using Corollaries 2 and 3 we have the following result. 

Corollary 4. If j, fc G N with I < j < k < n, then Ml(TSDk) H TSj = {^G}. 

By combining Proposition 1 with Theorem 5 we can identify the meet-irreducible 
elements of TSDk that are in TSDj, where j < k. 

Corollary 5. If j, fc G N with 1 < j < k < n, then 

Ml{TSDk) n TSDj = dAtoms(TSDj). 

5 The Decomposition of the Domains 

5.1 Removing the Tuple-Sharing Domains 

We first consider the decomposition of TSDk with respect to TSj. It follows from 
Theorem 1 and Corollaries 2 and 4 that, for 1 < j < fc < n, we have 

TSDk ~ TSj = TSDk. (1) 

Since SH = TSDn, we have, using Eq. (1) and setting k = n, that, if j < n, 

SH ~ TSj = SH. (2) 

Thus, in general, TSj is too abstract to be removed from SH by means of com- 
plementation. (Note that here it is required j < n, because we have SH ~ TS„ yf 
SH.) In particular, letting j = 1, 2 (assuming n > 2) in Eq. (2), we have 

SH PS = SH Con = SH, (3) 

showing that Con and PS are too abstract to be removed from SH by means of 
complementation. Also, by Eq. (1), letting j = 1 and fc = 2 it follows that the 
complement of Con in PSD is PSD. 

Now consider decomposing TSDk using TSk. It follows from Theorem 1, 
Proposition 2 and Corollary 3 that, for 1 < fc < n, we have 

TSDk ~ TSk = Moore(MI(T5Gfc) \ prs^{TSDk)) 

= {shG TSDk \VI G sh }. 



(4) 
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Thus we have 

TSDk ~ {TSDk ~ TSk) = TSk- (5) 

We have therefore extracted all the domain TSk from TSDk- So by letting k = 1, 
2 in Eq. (5), we have found the complements of Con in Def and PS in PSD: 

Def ~ Con = {sh G Def \ VI G sh }, 

PSD PS = {shG PSD \VlGsh }. 

Thus if we denote the domains induced by these complements as Def® and 
PSD®, respectively, we have the following result. 

Theorem 6. 

Def ~ Con = Def®, Def ^ Def® = Con, 

PSD PS = PSD®, PSD ~ PSD® = PS. 

Moreover, Con and Def® form a minimal decomposition for Def and, similarly, 
PS and PSD® form a minimal decomposition for PSD. 

5.2 Removing the Dependency Domains 

First we note that, by Theorem 5, Proposition 1 and Corollary 5 the complement 
of TSDj in TSDk, where 1 < j < fc < n, is given as follows: 

TSDk ~ TSDj = Moore (MI(T5Hfe) \ PTso,{TSDk)) 

= {shG TSDk \yS G SC :ffS <j ^ S G sh). (6) 

It therefore follows from Eq. (6) and setting k = n that the complement of prso^ 
in SH for j < n is: 

SH ~ TSDj = {sh G SH \yS G SC :#S <j ^ S G sh} (7) 
SH®. 

In particular, in Eq. (7) when j = 1, we have the following result for Def (also 
proved in [8, Lemma 5.4]): 

SH ~ Def = {sh G SH | Va; G VI : {x} G s/i }. 

This complement is denoted by SH®^^. Also, in Eq. (7) when j = 2, we have the 
following result for PSD: 

SH ~ PSD = {shG SH \ yS G SC :ffS <2 SGsh). 

This complement is denoted by SH®^^. 

We next construct the complement of PSD with respect to Def . By Eq. (6), 

PSD ^ Def = {shG PSD | Vx G VI : {x} G sh} 

PSD®. 

Then the complement factor Def~ PSD ~ PSD® is exactly the same as 
SH ~ SH®, and PSD and SH behave similarly for Def. 
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5.3 Completing the Decomposition 

Just as for SH, the complement of using PS (or, more generally, TSj where 
1 < j < n) is By Corollary 3 and Theorem 1, as PS is dual-atomistic, 

the complement of PS in PSD^ is given as follows. 

Theorem 7. 



PSD% PSD+ ~ PS 

= {sh G PSD I VI G sh,Vx G VI : {a;} G s/i }, 

PSD+ ~ PSD^ = PS. 

So, we have extracted all the domain PS from PSD'^ and we have the following 
result. 

Corollary 6. Def~, PS, and PSD^ form a minimal decomposition for PSD. 

6 Conclusion 

In [1], we said that PSD ~ PS yf PSD. This paper now clarifies that statement. 
We have provided a minimal decomposition for PSD whose components include 
Def~ and PS. Moreover, we have shown that PSD is not dual-atomistic. The 
meet-irreducible elements of PSD have been completely specified. As a conse- 
quence, it can be seen that the dual-atoms of PSD are precisely those elements 
which have the form SG\ {S'} where ff S <2. 

By studying the sharing domain SH in a more general framework, we have 
been able to show that the domain PSD has natural place in a scheme of domains 
based on SH . Moreover, by means of this approach we have highlighted the close 
relationship between Def and PSD and the many properties they share. 

Our starting point was the work of File and Ranzato. In [8], they noted, as 
we have, that SH^^^ ~ PS = SH^^^ so that none of the domain PS could be 
extracted from SH^^^. They noticed that pps maps all dual-atoms that contain VI 
to SG and thus lose all pair-sharing information. To avoid this, they replaced PS 
with the domain PS' where, for all sh G SH^^^, pps'{sh) = ppg{sh) \ ({ F/j \ s/i) , 
and noted that SH^^^ ~ PS' = {sh G SH^^^ \ VI G sh}. To understand the 
nature of this new domain PS' , we first observe that, PS' is simply the reduced 
product of PS and TSn- This is because TSn = M1{TS„) = {b’G \ { F/}, SG}. 
In addition, SH'}}^^ ~ TSp = {sh G SH^^j \ VI G sh}, which is precisely the 
same as SH^^^ ~ PS' . Thus, since SH^^^ ~ PS = SH^^^, it is not surprising that 
it is precisely the added component TSn that is removed when we compute the 
complement for SH^^j with respect to PS' . 

We also note that, in [8], the fact that one of the components found by 
decomposition (here called TSn) bas only two elements is seemingly regarded as 
a positive thing, because [8, Section 1] “This shows that domain decomposition 
can lead to great gains in the size of the representation.” In our humble opinion, 
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if one of the components is very small (e.g., only 2 elements in this case) this 
means that almost all the complexity remained elsewhere. If the objective is 
to decompose in order to find space saving representations for domains and to 
simplify domain verification problems, then “balanced” decompositions (that is, 
into parts of nearly equal complexity) should instead be regarded as the best 
one can hope for. 

It should be stressed that the problems outlined above are not dependent on 
the particular domain chosen and, in our opinion, they are mainly related to the 
methodology for decomposing a domain. Indeed, we argue that complementation 
alone is not sufficient to obtain truly minimal decompositions of domains. The 
reason being that complementation only depends on the domain’s data (that 
is, the domain elements and the partial order relation modeling their intrinsic 
precision), while it is completely independent from the domain operators that 
manipulate that data. In particular, if the concrete domain contains elements 
that are redundant with respect to its operators (because the observable behavior 
of these elements is exactly the same in all possible program contexts) then 
any factorization of the domain obtained by complementation will encode this 
redundancy. However, the theoretical solution to this problem is well known 
[6,9,10] and it is straightforward to improve the methodology so as to obtain 
truly minimal decompositions: first remove all redundancies from the domain 
(this can be done by computing the quotient of a domain with respect to the 
observable behavior) and only then decompose it by complementation. This is 
exactly what is done here. 

We conclude with some remarks on complementation. There are a number of 
reasons why we believe this is important. First of all, complementation is really 
an excellent concept to work with from a theoretical point of view: it allows the 
splitting of complex domains into simpler components, avoiding redundancies 
between them. However, as things stand at present, complementation has never 
been exploited. This may be because it is easier to implement a single complex 
domain than to implement several simpler domains and integrate them together. 
Note that complementation requires the implementation of a full-integration 
between components (i.e., the reduced product), otherwise precision would be 
lost and the theoretical results would not apply. 

One notable example of domain decomposition that does enable significant 
memory and time savings with no precision loss is the GER decomposition of 
Pos [2], and this is not based on complementation. Observe that the complement 
of G with respect to Pos is Pos itself. This is because Pos is isomorphic to SPl 

ritfsf 

[3] and G = Con = TS\ so that, by Eq. (3), Pos ~ G = Pos. It is not difficult 
to observe that the same phenomenon happens if one considers the groundness 
equivalence component E, that is, Pos ^ E = Pos. In fact, it can be shown 
that two variables x and y are ground-equivalent in sh G SH = Pos if and 
only if rel({x}, s/i) = rel({t/}, s/i). In particular, this implies both {x} ^ sh and 
{y} ^ sh. Thus, it can be easily observed that in all the dual-atoms of Pos no 
variable is ground-equivalent to another variable (because each dual-atom lacks 
just a single sharing-group). 
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Abstract. Linear refinement is a technique for systematically construct- 
ing abstract domains for program analysis directly from a basic domain 
representing just the property of interest. This paper for the first time 
uses linear refinement to construct a new domain instead of just recon- 
structing an existing one. This domain is designed for definite freeness 
analysis and consists of sets of dependencies between sets of variables. 
We provide explicit definitions of the operations for this domain which 
we show to be safe with respect to the concrete operations. We illustrate 
how the domain may be used in a practical analysis by means of a small 
example. 

Keywords: Abstract interpretation, abstraet domain, linear refinement, 
static analysis, freeness analysis, logic programming. 



1 Introduction 

Linear refinement [13] is a technique for systematically constructing abstract 
domains for program analysis. Given a basic abstract domain representing just 
the property of interest together with an appropriate concrete operation (which, 
as we are analysing logic programs, is usually unification) the new more refined 
domain is constructed. 

Freeness analysis is concerned with the computation of a set of variables 
(called free variables) which, at a given program point, are guaranteed to be 
only bound to variables. As pointed out in [20], the information collected by 
a freeness analysis is useful for increasing the power of non strict independent 
AND-parallelism [14], for optimising unification, for goal ordering and for the 
avoidance of type checking. Moreover, for sharing analysis (which deals with the 
possible sharing of terms among variables), freeness is an important component 
that can improve both the efficiency and the precision of the analyser. 

Freeness analysis has received relatively little attention from the research 
community, compared for instance with groundness analysis [1,6]. If we consider 
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the freeness domains proposed in the literature, we find that two different ap- 
proaches are taken. The first considers freeness as a mode [9,20]. In this case, a 
substitution is simply abstracted as the set of free variables (that is, those which 
are mapped by the substitution to variables). This basic approach is extremely 
imprecise. To improve precision, [9] combines modes with the information about 
the set of variables present in the term bound to each variable, whereas [20] 
combines freeness with sharing analysis, to improve both the sharing and the 
freeness components. Both [9] and [20] show that the basic freeness abstraction 
is not acceptable at all. This is because even if it is known that a variable is 
free in the substitutions 0\ and 6*2, there is no guarantee that it will be free 
in their most general unifier. Then the most precise abstraction of unification 
is so imprecise to be useless. Compare this with the case of groundness, where 
the simple abstraction of a substitution into the set of variables that are ground 
allows the definition of an abstract unification operation which is imprecise but 
still useful. The second approach abstracts a substitution into an abstract set of 
equations, which can be seen as an approximate description of the concrete set 
of equations represented by the substitution itself [3,4,5,19,23]. This means that 
part of the functor structure of the substitution is maintained in the abstract 
sets of equations, resulting in an extremely precise analysis but with rather com- 
plex abstract operations. In [15] an intermediate approach is proposed. Here the 
terms are fully abstracted apart from the positions of their variables. These are 
maintained by means of paths that define their positions in the term. 

The contribution of this paper is in the definition of a new domain for definite 
freeness analysis which is able to express freeness dependencies without the help 
of any auxiliary domain (like sharing, for instance). This domain is constructed 
by means of a linear refinement operation, as done in the case of groundness [21] 
and types [17]. We provide a representation for this domain and safe operations 
on this representation. It can be shown that our domain is contained in the 
Sharing x Free domain of [16]. This means that its precision is no more than that 
of [16]. The advantage of our domain is its natural definition, together with the 
simplicity of its representation and its abstract operators. Note that, previously, 
only a restricted form of linear refinement (called the Heyting refinement) has 
been used. It was applied to the reconstruction of the domain Pos for groundness 
analysis [21] and the construction of type domains [17]. 

The paper is organised as follows. Section 2 provides some preliminary def- 
initions. Section 3 defines the concrete domain and its abstraction into a basic 
domain for freeness analysis. Section 4 refines this basic domain through the 
standard approach for linear refinement and shows that this does not lead to 
interesting domains. Section 5 shows how to overcome this problem through the 
use of internal dependencies. Section 6 defines a representation of our domain of 
internal dependencies based on sets of dependencies between variables. Sections 
7 and 8 define the algorithms for the computation of the abstraction function and 
of the abstract operators on the representation, respectively. Section 9 provides 
an example of freeness analysis of logic programs based on our domain. Sec- 
tion 10 summarises the contributions of this paper, compares our work with the 
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standard framework for freeness analysis, which is based on the Sharing x Free 
domain, and consider the problem of condensing for our domain. 

2 Preliminaries 

2.1 Terms and substitutions 

Given a set of variables V and a set of function symbols S with associated 
arity, we define terms (if, G) as the minimal set of terms built from V and S as: 
V G terms(if, V) for every v € V and if G, . . . , G terms(i7, V) and f G if is a 
function symbol of arity n > 0, then f (G, . . . , t„) G terms(if, V). vars(t) is the 
set of variables occurring in t. A term t is ground if vars(t) = 0. Given a set of 
variables V and a variable x,V iJ x means V U {x} and V — x means V — {x}. 

A substitution 0 is a map from variables into terms. We define the sets 
dom(6*) = {x\9{x) yf x} and rng(0) = Ua,gdom(e) vars(0(cc)). We require dom(0) 
to be finite. We define 6>v as the set of idempotent substitutions 9 such that 
dom(0) U rng(0) C V and dom(0) n rng(9) = 0. Given 9 and a set of variables 
E, we define 9 \r as 9 \ r { x ) = 9{x) if a: G i? and = x otherwise. Given a 

term t G terms(i7, V) and 9 G 0y, t9 G terms(A', V) is the term obtained with 
parallel substitution of every variable a; in t with 9{x). 

V{S) is the powerset of a set S. Vf{S) is the set of all finite subsets of S. 

2.2 The domaiu of existential Her brand constraints 

Let V be an infinite set of variables. For each V G "P/CF), we have the set of 
Herbrand equations 

Cv = {t^ = G terms(A', G)} . 

Suppose c G Cv- Then we say c9 is true if C9 = t“^9 for every {C = t^) G c. 
We know [18] that if there exists 9 G 0v such that c9 is true, then c can be 
put in normal form mgu(c) = {t'j = t' j\j G J}, where t' j G V are distinct 
variables which do not occur in t'^. for every j,k G J and the variables in mgu{c) 
are all contained in the variables of c. Moreover, we have c9 is true if and only 
if mgu(c)0 is true. If no 9 exists such that c9 is true, then mgu(c) and hence 
the normal form of c are undefined. If c is in normal form, c can be seen as a 
substitution, and we use the notation c{v) meaning the term t on the right of 
an equality v = t G c ii such an equality exists, v itself otherwise. 

Let V and W be disjoint infinite sets of variables. For each V G P/(V), we 
have a set of constraints, called existential Herbrand constraints: 

Hv = {3wc\W GVfiW),cGVf(Cvuw)} ■ 

Here, V are called the program variables and W, the existential variables. Given 
a Herbrand constraint h = 3wc G Hy, the set of its solutions is defined as 

solv{h) = {9\v \9 G 6>yuiV; ™g(0) C V and c9 is true} . 
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Hence soly(3vpc) = soly(3iy mgu(c)). 3vpc is said to be in normal form if c is 
in normal form. 

Two constraints hi,li 2 S Hy are called equivalent if and only if soly(ft.i) = 
soly(ft-2)- In the following, a constraint will stand for its equivalence class. Since, 
as shown above, every consistent constraint can be put in an equivalent normal 
form, in the following we will consider only normal constraints. 

Two operations and diagonal elements are defined on (normal) Herbrand 
constraints: 

Definition 1 (Conjnnction on Hy). Assuming Wi n W 2 = 0 ^ , we have: 



( 3 wiCi) (3W2C2) = 



3wiuW2 mgu(ci U C 2 ) 
undefined 



z/mgu(ciUc2) exists 
otherwise. 



Definition 2 (Restriction on Hy). We define the restriction of a Herbrand 
constraint 3vpc with respect to a program variable x as 

3(^'^(3wc) = 3wuAfc[lV/a;] where N ^ {V W). 

That is, we consider the program variable x as a new existential variable N. 



Definition 3 (Diagonal elements on Hy). Given two sequences of distinct 
variables {xi,... ,x„) and {yi,... ,y„) in V whose intersection is empty, we 
define 



\ / \ = = yAi = 1, . . . ,n\ . 

{xi,... ,xn),{yi,... ,yn) L * ) j J 

2.3 Abstract interpretation 

We briefly recall the main result about abstract interpretation [8]. Let (C,^) 
(concrete domain) and {A, <) (abstract domain) be two posets. A Galois con- 
nection between them is a pair of functions a (abstraction) and 7 (concretisation) 
such that 

1. a and 7 are monotonic, 

2. for all X G C, we have a; ^ (7 o a)(x) and 

3. for all y G A, we have (a o 7)(y) < y. 

A Galois connection is called a Galois insertion if (a o j)(y) = y. In a Galois 
insertion the abstract domain does not contain useless elements, i.e., elements 
which are not the image through a of some concrete element. 

^ Note that this is not restrictive because the names of existential variables are irrel- 
evant: given a Herbrand constraint 3wc, the constraint 3vv'c[IT'/IT] is equivalent 
to it. Hence we can always assume Herbrand constraints to be renamed apart with 
respect to existential variables. 
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Let / : C" ^ C be a concrete operator and assume that / : A. Then 

/ is correct with respect to / if and only if for all xi, . . . , a;„ G ^ we have 
a(/(7(xi), . . . ,7(x„))) ^ /(xi,... ,x„). For each operator /, there exists an 
optimal (most precise) correct abstract operator / defined as /(j/i,... ,y„) = 
a(/(7(yi), . . . ,j{yn))), where a is extended to sets S' G C by defining a{S) = 
Asgsa(s). Note that this requires A to be completely A-closed and to contain a 
top element, i.e., to be a Moore family. In general, if AT is a subset of a lattice L 
then A(Af) denotes the Moore-closure of X, i.e., the least subset of L containing 
X which is a Moore family of L. 

2.4 Domain refinement 

A systematic approach to the development of abstract domains is based on 
the use of domain refinement operators. Given an abstract domain A, a domain 
refinement operator R yields an abstract domain R{A) which is more precise than 
A. Classical domain refinement operators are reduced product and disjunctive 
completion [11]. The reduced product of two domains A and B is isomorphic to 
the Cartesian product of A and B, modulo the equivalence relation (ai,bi) = 
(tt2, 62) if and only if oi A 5 i = 02 A 62- This means that pairs having the same 
meaning are identified. 

Linear completion was proposed in [ 12 ] as a powerful domain refinement 
operator. It allows to include in a domain the information relative to propagation 
of the abstract property of interest before and after the application of a concrete 
operator Kl. 

Let L be a complete lattice and Kl be a partial map Kl : L x L — > L. The 
functions — o® and <1—^ are defined as 

a -0^5 —\J {I ^ L\ii aMl is defined then a Kl I <l 6} , 

^ (1) 
a<i— ^5 —\J {I & L\ii IM a is defined then I Kl a <l b} . 

L 

Note that if Kl is commutative then a<—^b = a—>^b for every a,b & L. 

We define the linear refinement domain L—o^L as 

L = Ln ->^b, a^^b\a, b G L} , 
which can be simplified into 

L-o^L = ^{a-0^6, a<i-®&|a, 6 G L} 

if the elements of L can be obtained as (greatest lower bounds of) arrows. This 
case is important since it allows a simpler representation of the domain and 
simpler operations on this representation. 

The right linear refinement of L is defined as 

L -^fL = Ln ^{a ^^6|a, b G L} . 



(2) 
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Again, if the elements of L can be obtained as (greatest lower bounds of) arrows, 
Equation (2) can be simplified into 

L^fL=X.{a^H\a,h&L}. (3) 

When not explicitly stated, L—>^L will stand for L—o^L. 

3 Preeness analysis 

The concrete domain we abstract is the collecting version [8] of the domain of 
Herbrand constraints introduced in Subsection 2.2, i.e., the lattice {V{Hv), H, U, 
Hy, 0) . The operations and the diagonal elements on Hy are point- wise extended 
to V{Hy) as follows: 

Si 5*2 = {^1 ^2 1^1 G Si, /i2 G S2 and hi /12 is defined } 

= { 3^-h\ h G 5} = {<5^^ } . 

The first abstraction of this domain is a domain able to model basic freeness and 
nothing else. 

Given a finite set of variables V, we define the set of variables that are free 
in a Herbrand constraint as 

free (3vpc) = {v e V |c(w) G (V U W) } . 

The set of Herbrand constraints such that the variables U are free is defined as 
free(17) = { /i G Hy\ U C free(/i)} . 

The basic abstract domain for freeness analysis is 

Eree^ = {free(C/)|17 C V} , 

ordered with respect to set inclusion. We have that free(0) is the whole set of 
Herbrand constraints, and free ( Vi)n free (V2) = free(ViUV2) for every fo, V2 C V. 
Then Frecy is a Moore family of V{Hy). 

^From now on, capital letters will stand for sets of variables, small letters for 
single variables and bold face letters for elements of the domain, originated by 
the corresponding variables. For instance, we will write U for free(17) and xyz 
for bee{{x,y, z}). 0 will stand for free(0), but we will write explicitly free(0) 
when this notation can give rise to ambiguity. We write XJ \U' for free([/ \ U') 
for every sets of variables U and U'. 

Consider c = {x = f{y), z = re} and V = {k, x, y, w, z}. We have c G kywz, 
but c ^ kxywz. 

Elements of V{F[y) can be mapped into Frecy by the abstraction map ay 
which is an upper closure operator: 

aUS) = C\if ^ ^ f} ■ 

The concretisation map 7° is the identity map. {ay,^y) is a Galois insertion 
from V{F[y) into Frecy. 
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4 A naive approach to linear refinement 

Consider the two Her brand constraints ci = {x = a} and C 2 = {y = a} over V = 
{x,y,z}. Their unification leaves z free. A useful domain for freeness analysis 
must capture this behaviour. In Frecy, c\ is abstracted into yz and C2 into xz. 
Let be the best correct approximation of We have yz*^’"“v xz = 

free(0) which is not contained in z (since {x = a,y = z} is abstracted into yz as 
Cl, and its unification with C2 leaves z non free). This result means that Frecy 
is useless for freeness analysis. 

It seems reasonable to use Kl = linear refinement definition of 

Subsection 2.4, in order to improve the Frecy domain with freeness dependencies 
information. The operation V is set union and the ordering < is set inclusion. 
We obtain the domains 

Fr'^y = Freely FF+^ = Fr\, VrV for i > 1. 

Consider Fry. The constraint ci is contained in yz. We want to show that yz 
is the best approximation of ci in Fry. Indeed, if ci S L— i>R and L— i>R is not 
top, then i? yf 0. Since C3 = {x = y,z = y} belongs to every element of Frecy and 
hence to L, we would have c\ G R, which is a contradiction since i? yf 0 

and all three variables are made non free by the unification. Then ci is abstracted 
into yz in Fry. By symmetry, C 2 is abstracted into xz in Fry. The same argument 
used above in the case of Frecy shows that the best correct approximation of 
the operation is not contained in z when applied to ci and C 2 . Again, 

Fry is useless in practice for freeness analysis. Moreover, we must use Equation 
(2) rather than Equation (3), since the elements of Frecy cannot be obtained 
as arrows. This means that Fry is a complex domain, formed by elements of 
Freey and new elements obtained as arrows. Devising a representation for Fry 
and correct (and possibly optimal) operators on this representations would not 
be an easy task. 

Consider Fry . This domain might be able to conclude that the unification of 
Cl and C 2 leaves z free. However, it is extremely complex. Devising a represen- 
tation with correct (and possibly optimal) operators becomes almost infeasible. 
The resulting operators would be computationally expensive. 

These arguments show that this is not the refinement we were looking for. 
In the following, we will define a different freeness domain as linear refinement 
of the domain Freey where, however, the operation jg replaced by a new 

operation that protects local variables. 

5 Internal freeness dependencies 

Consider the constraint c = {y = f(a)}. We cannot be sure that x is still free 
after conjunction with a constraint F where even both x and y are free. However, 
if X is free in c', instantiation of x in mgu(c, c') can only be a consequence of the 
non-freeness of y in c which pushes c' to instantiate x. Then, from the point of 
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view of c, it is an external mechanism. Namely, c instantiate y and d replies by 
instantiating x in turn. 

Assume we want to model the effects internal to c of the conjunction with d . 
For instance, in the conjunction of the constraint c = {y = f (a), x = v,h = rc} 
with d = {y = f{x),w = a}, we want to model the instantiation of w and h, 
but not the instantiation of x and v, which are an external instantiation and a 
consequence of an external instantiation from the point of view of c. This can 
be achieved by refining the Freey domain with respect to the operation 



internal^ 



internal {hi, / 12 ) 



hi e pi, /i 2 G P 2 and 1 

internal^^(/ii, / 12 ) is defined J ’ 



for every pi,p 2 G 'P{Hv)j where internal^'^ = Xh, hhh*^'^ over{h') and over(/i') is 
h' where all free variables have been renamed into new, overlined, variables. For- 
mally, we define over(3u/c) = 3vars(over(c))\y over(c), where over({a;i = ti, . . . ,Xn 
= tn}) = {over(a;i = ti), . . . ,over(x„ = t„)} and 



over(a;i = ti) 



over(f(ti, . . . ,tm)) 



j over(a;i) = over(ti) if ti G V U W 
1 = over(ti) otherwise 

f (over(ti ),..., over(tm)) over(u) 



V if V G V 

V if V G W. 



For instance, over{{y = f {a), x = v,h = w}) = 3- -j^ -{y = f (a), x = v,h = w} 
and over(3{^}{a; = f{w),y = w,h = g(z)}) = = f{w),y = w,h = 

By computing mgu(c, over(c')), we can observe in isolation the internal and 
immediate effects of d on c, without bothering about the fact that these effects 
can reach d , inducing some new instantiations that can reach c in turn, and so 
on. 

The expansion of the right arrow — 1 > of Equation (1) yields in our case (note 
that internal^ is a total map, while is not): 



^internafi^^v) 



\J G V{Hv) internaP(^''^(p,L) <v(Hv) R-} 

V{Hv) 

IJ {p G V{Hv) internaP(^"')(p,L) C r| 
for all h' G Hy, 

= {h £ Hy I if /i' G L and h over(/i') is defined 
then h-k^^ over(/i') G R 



These kinds of dependencies will be called internal dependencies and can 
be seen as the building blocks of the unsatisfactory, global dependencies of the 
previous section. 

We define the refinement: 

Freely = Free^ Vree^ . 
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Note that every point U of Frecy is such that U = V _,>internai^( '^'u. Then we 
can use Equation (3) instead of Equation (2). 

^From now on, — 1 > will stand for _,>internai^<"v) ^ 

Free^^yy is depicted in Figure 1. 0— ox contains no constraint, xy— oxy n 
X -ox = xy -oxy n y -oy = x -ox n y -oy = {e}. xy -oxy = {e, {x = y}}. x -ox 
contains e and constraints like {y = f(a)} or = f(ic)}. xy— ox contains 

the constraints above, {x = y} and constraints like {y = f(a;)}. Symmetrically 
for y — oy and xy— oy. The top element contains the whole set of constraints. 
Namely, it is the only point containing constraints like {x = f(a),7/ = f(f(a))}. 
This means that the abstraction of {x = f(a)} is y— oy, the abstraction of 
{y = f(a)} is x-ox and the abstraction of {x = f(a),y = f(a)} is 0-o0. 

Considering V = {x,y,w, z}, the constraint {x = f(y)} belongs to z— oz. 
Then, from the point of view of internal freeness dependencies, 2 : is free after 
composition with a constraint where 2 : is free, where composition means appli- 
cation of the internal^ operator. 



0—t>0 = Hy 




Fig. 1. The domain Free^^ of internal freeness dependencies 

The abstraction map apree ■ 'P{Hy) Frecy is induced by the abstract 
domain Frecy. Namely, the pair {a Free, 7 Free) is defined as 

apreeiS) = (l{f G Fre6y\S c /} for every S G V{F[y) 

iFreeif) = f for every / G Frecy, 

and forms a Galois insertion between V{F[y) and Frecy- 

The concrete operations of Section 3 induce corresponding abstract opera- 
tions on Frecy. We will show that the internal freeness dependencies can be used 
to model the composition (the operation) in a precise way. The insight 

is that to compute the global dependencies between two constraints we have to 
exploit our knowledge of their internal dependencies. 

The next section introduces a computer representation of the elements of 
Freey. Later we will provide algorithms on this representation for computing 
the abstract operations induced by the operations of Section 3. 
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6 A representation for Freey 

The obvious choice for the representation of freeness dependencies is to use 
pairs of sets of variables, with a distinguished element _L. We will write the pair 
({ui, . . . , Vn}, {v[, . . . , u^}), n,m > 1, as vi • • ■ Vn ^ v{ ■ ■ ■ v!^ and will represent 
the set of constraints vi • • • v„ -i>vi • • • v^. Let Repy = V{V * u {J_}. The 
concretisation function 'jHep ■ Repy Free]/ is defined as 

n k 

. , lRep{Ai) where 

t — 1 

lRep{vi • • • Vn ^ I"! • • • I’m) = Vl • • • V„ -pV^ • • • . 

Note that top is represented by the empty set. Since different elements of Repy 
can have the same concretisation, we consider an equivalence relation over Repy 
defined as ri = T 2 if and only if 7 _Rej,(ri) = ^Rep{r 2 )- ^From now on, every 
element of Repy will stand for its equivalence class, and Repy will stand for 
Repy/ =. 'jRep will be applied to equivalence classes and the corresponding 
abstraction function is defined as follows. Given some / C N and, for each i G I, 
Li— pRi G Freey, then 



aflep(riie/{Li -pRi}) = {L* ^ G 1} . (4) 

It can be easily checked that {oiRep,lRep) is a Galois insertion between Freey and 
Repy. Every operation in V{F[y) induces a corresponding operation in Freey 
which induces a corresponding operation in Repy. To use Repy for freeness 
analysis of logic programs we must provide 

1. an algorithm for computing a Rep a Free {{h}) for every h G Fly (Section 7); 

2. an algorithm for computing the set of variables that are free in every Her- 
brand constraint belonging to the concretisation of an element of Repy (Sec- 
tion 7); 

3. an algorithm for computing the operation induced by (Section 8); 

4. an algorithm for computing the operation induced by for every x G V 

(Section 8); 

5. an element of Repy corresponding to for every sequences x and y 

whose intersection is empty (Section 8). 

7 The abstraction function 

In this section we provide an algorithmic definition of the restriction of the 
function URepapree '■ F{Hy) I— > Repy to singletons. 

Definition 4. Given v GV , we define Lfiv) = {v' G V\ vars(c(u'))nvars(c(u)) yf 
0}. We define aaig(3wc) = {{Lc{v) \ W) v\v G free(dwc)}. 

We prove that aaig{h) is equivalent to aRepapreeiih}) for every h G Hy. 

Proposition 1. Given h G Hy, we have aaig{h) = a Repa pree{{h}) ■ 
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Example 1. Consider V = {x,y^v,z\. The table below shows some Herbrand 
constraints and their abstraction through a: 



h 




{u = f (x), z = a} 

{u = g{x,y),z = f(y)} 

{v = g{x,y),z = y} 

{v = x,y = x} 

3{u,,s} {v = f{w,y),s = f{x),z = w} 


{vx ^ X,y^y} 

{vx X, vyz y} 

{vx X, vyz y, vyz z} 
{vxy X, vxy y, vxy v,z ^ z} 
{x ^ x,yv ^ y, vz z} 



Definition 5. A G Repy is in normal form if it is 1. or if A = {Bi => 
vi,. . . , B„ v„} with Vi G Bi for every i = 1, . . . , n and Vi ^ vj for every 
i,j = 1,... ,n, i^j. 

Note that, by definition, aaig(h) is in normal form for every h G Hy. 

Proposition 2. For all A G Repy there is an A' G Repy in normal form such 
that jRepiA) = 7 _Rep(A'). 

In the following, we always consider elements in normal form as representative 
of elements of Repy. 

We want an algorithm for computing the set of variables that are free in every 
consistent Herbrand constraint belonging to the concretisation of A G Repy. The 
following definition provides such an algorithm: 

Definition 6. We define the map freCaig : Frecy P(C) as freeaZg(T) = V 
and freCa/g ({Hi "Ci • • • B^ Vn \ ) — , . . . , Vn } . 

Proposition 3. Given A G Repy, we have i^h^-ypreeiRepiA) free(ft-) = freeaZg(^). 

8 The abstract operators 

In this section we define the abstract counterparts of the conjunction and re- 
striction operators, as well as of the diagonal elements of Section 3. 

The abstract conjunction operator constructs an arrow for a variable v by 
unfolding the body of an arrow for v contained in A\ with the arrows in A^ and 
so on alternately. If this unfolding fails, no arrow is built for the variable v. 

Definition 7 (Conjunction on Repy). Given Ai,A 2 G Repy we define 
f T if Ai = A or A 2 = F 

A 2 






96 



P. Hill and F. Spoto 



where = B, = unf(B, yli+((,_i) jjiod 2 )) for i > 2, 

and 



\mf{fail, A) = fail unf(0,2l) = 0 






Bi\vi ■ ■ ■ Bn\vn if 3 {Bi ^ Vt) e A for i = 1, . . . ,n 
fail otherwise. 



Note that the definition above provides an algorithm for computing since, 

V being finite, there must exist two different natural numbers i,j, both even or 
both odd, such that dunf^^^^^ (i?) = dunf^^ Then the computation of 

the union can be stopped at the max(z,j)-th iteration. Moreover, note that the 
operation is closed on the set of elements of Repy that are in normal form. 



Example 2. Consider the two sets of arrows A\ = {xy ^ x,vz ^ v,vz z} 
and A 2 = {yz y,x ^ x,v ^ w}. Consider the variable x. There is an arrow 
xy ^ X for X in a4i. Then we start unfolding xy and we obtain dunf^^ = xy, 
dunf^^ = z, dunf^^ Aa ~ ^ d^^Ai Aa ~ ® every t > 4. Hence the 
abstract conjunction contains the arrow vxyz x. If we consider the variable 
y, we do not find any arrow for y in A\. Then no arrow for y is computed 
by the algorithm. If we consider the variable v, we have dunf^^ = vz and 
dunf^j^ = fail, since there is no arrow for 2 ; in H 2 . Then the algorithm does 
not add any arrow for v in the abstract unification. 

Note that hi = {y = f(x),z = u} e J Free! Rep (Ai) and h2 = {z = g(y)} € 
1 Freel Rep{A2) ■ Their Concrete conjunction is h = {v = g(f(x)),j/ = f{x),z = 
g(f(x))}. Note that neither y nor v are free in h. Then it is correct that the 
abstract unification algorithm does not contain any arrow for those two variables. 
Moreover, note that the arrow vxyz x, computed by the algorithm, is correct 
for h. 

Proposition 4. Given Ai,A 2 € Repy we have 

Freel Rep ^ ^ ^2^ 12. '~i Free'l Repi.Al) ^ ^ 'y Free^ Rep (.A2') • 



Definition 8 (Restriction on Repy)- Given A G Repy and x G V we define 



= 



{x ^ x} U {{L — x) ^ {R — x)\L ^ RG A, R f^{x}} if A A 
T otherwise. 



Note that the operation is closed on the set of elements of Repy that are 

in normal form. 

Proposition 5. Given A G Repy and a variable x, we have 

iFreelRep Freel Rep{A) . 
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Definition 9 (Diagonal elements on Repy). Given x = (xi,... ,x„) and 
y = {yi,... ,yn) formed by distinct variables and whose intersection is empty, 
we define 



^x,y ~ 



{xiyt ^ yi,Xiyt ^ Xt\i = 1, . . . ,n} . 



Note that is in normal form. 

Proposition 6. Given x = (xi, . . . , x„) and y = {yi, . . . , y„) formed by distinct 
variables and whose intersection is empty, we have 



^ Rep ^ Free 



\ X,V ) 



^R^Vv 



9 An example 

We show here a simple example of freeness analysis of logic programs, based on a 
call pattern version of the s-semantics [2] . Given an input goal for a procedure and 
a selection rule, a (concrete) call pattern is the name of a procedure call found 
during the resolution of the goal, together with the partial Herbrand constraint 
computed up to that call, restricted to the arguments of the procedure. Note 
that the input goal itself is a call pattern. Let us call 6i, . . . , the arguments of 
the input goal and ki, . . . , Km the arguments of a call pattern found during the 
resolution of the input goal. A call pattern semantics collects every call pattern 
found during the resolution of a goal. An abstract call pattern semantics is the 
same, except that it uses abstract constraints instead of concrete ones. Here, 
we will use elements of Frecy as the abstract constraints and we will assume a 
leftmost selection rule. 

Consider the following program, taken from [20], that computes the product 
of a matrix and a vector: 

multiply ( [] ,V, [] ) . 

multiply ( [VMIMM] ,V, [H|T] ) :-vmul(VM,V,H) .multiply (MM, V,T) . 
vmuK [] , [] ,0) . 

vmul([Hl|Tl] , [H2IT2] ,R) :-vmul(Tl,T2,Rl),R is (R1+H1*H2) . 

The call pattern semantics of the predicate vmul contains just the following 
two call patterns, both for vmul: 



{l\Ki K\,L2K2 K2, tsKs K3, LiK\ il,L2K2 (-2, GKS C}j Vmul 

{c G, i-iKi Ki, 1-2K2 R2, K 3 R3}, vmul 

The first call pattern corresponds to the same input pattern of the procedure, 
while the second call pattern corresponds to every recursive call. Both call pat- 
terns contain the arrow LiKi Ki for i = 1,2. The first contains the arrow 
^3^3 ^3 Emd the second the arrow k^ k^. If we make the conjunction 
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of this denotation with the input pattern 6 ^, Ki ^ Ki\l < i < 3}, 

the z-th argument of every concrete call pattern (zci) is free. Note that the second 
abstract call pattern says that the third argument of every concrete call pattern 
for vmul (ks) is free independently from the input pattern (indeed, it contains 
the arrow ^ ks)- This means that in every recursive call found in the concrete 
resolution, the third argument is always free. 

The denotation of multiply contains more interesting information. It con- 
tains call patterns for multiply as well as call patterns for vmul: 

{ziKi Ki, Z2K2 => K2,LsK3 K3, iiKi ^ Zi, 62^2 = 1 " 1 - 2 , ^ 3^3 Z3}, multiply 

{ziKi Ki,i- 3 K 3 K3}, multiply 

{Z2K2 1-2, t-iKi KI, Z2K2 K2, L 3 K 3 K 3 }, vmul 

{tlKi Ki,L2K2 K 2 ,K 3 K 3 } , Vmul 

{ziKi Ki,I-3K3 K3},vmul 

{ziKi Ki,K3 K 3 },vmul 

Both call patterns for multiply contain the arrow Z3K3 K 3 . The presence 
of this arrow in the second call pattern for multiply means that our domain 
has been able to capture the freeness dependency of the variable T from the 
third argument of the head of the second clause for multiply. If we make the 
conjunction of the denotation above with an input pattern for multiply whose 
third argument (zs) is free ({Z 3 1 - 3 , Ki Hi\l < i < 3}), we conclude that 

every concrete call pattern for multiply will have its third argument (zcs) free. 
As noted in [20], if we assume that multiply is always called with its third 
argument free and the first two arguments ground, that information allows the 
two calls vmul(VM, V, H) and multiply(MM, V, T) to be executed in AND parallelism 
[14]. 

10 Conclusion 

This paper provides the first new domain to be constructed solely by use of the 
linear refinement technique. This domain, which is for freeness analysis, enjoys a 
natural definition and simple abstract operators. An interesting consequence of 
our investigation is that the standard approach of linear refinement did not lead 
to a useful domain. Instead, we found a novel approach which we called internal 
dependencies. 

The precision of Frecy needs to be compared with alternative domains for 
freeness analysis, for instance the Sharingy x Frecy domain. It has already 
been shown that for just the freeness information, the domain Sharingy x Frecy 
contains redundant information and that {[Sharingy) x Frecy is just as precise 
[24]. The domain [Sharingy [10] is a subset of Sharingy, formed by those points 
whose sharing sets are downward closed with respect to set inclusion. It can be 
shown that the Repy domain is a subset of the reduced product {[Sharingy) x 
Frecy. Therefore, freeness analysis performed with Repy cannot be more precise 
than freeness analysis performed with {[Sharingy) x Frecy (or Sharingy x Frecy 
[24]). Our opinion is that the two domains provide the same information with 
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respect to freeness. This argument still needs to be investigated further, possibly 
proving it through the quotient technique [7]. 

Condensing does not hold for our domain. As shown in [22], under suitable 
conditions a domain is condensing if and only if it is closed with respect to 
— Our domain is not closed with respect to such a refinement. This is 
not surprising, since we refined a basic domain for freeness with respect to an 
operation that is not unification. Note, however, that our domain still yields 
correct results using a goal-independent semantics. The actual gain in precision 
when using a goal-dependent semantics is not easily quantifiable without an 
experimental evaluation. 
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Abstract. Binary Decision Graphs are an extension of Binary Deci- 
sion Diagrams that can represent some infinite boolean functions. Three 
refinements of BDGs corresponding to classes of infinite functions of in- 
creasing complexity are presented. The first one is closed by intersection 
and union, the second one by intersection, and the last one by all boolean 
operations. The first two classes give rise to a canonical representation, 
which, when restricted to finite functions, are the classical BDDs. The 
paper also gives new insights in to the notion of variable names and the 
possibility of sharing variable names that can be of interest in the case 
of finite functions. 



1 Introduction 

Binary Decision Diagrams (BDDs) were first introduced by Randal E. Bryant 
in [4]. They turned out to be very useful in many areas where manipulation 
of boolean functions was needed. They allowed a real breakthrough of model 
checking [7], they have been used successfully in artificial intelligence [14] and in 
program analysis [8,13,2,16]. 

One limitation of BDDs and its variants is that they can only represent 
finite functions. Indeed, it induces a well-known and quite annoying restriction 
on model checking, and it restrains its use in program analysis. This paper 
explores the possibility of extending BDDs so that they can also represent infinite 
functions. This extension will allow the model checking of some infinite state 
systems or unbounded parametric systems, or the static analysis of the behavior 
of infinite systems, where the expression of infinite properties such as fairness is 
necessary. 

After a presentation of our notations, sections 3 and 4 present the main ideas 
that allow this extension: the first idea is the possibility of sharing variable names 
for different entries, while preserving a sound elimination of redundant nodes. 
The second idea is the possibility of looping in the representation (thus the term 
of Binary Decision Graph (BDG) instead of Binary Decision Diagram), while 
preserving the uniqueness of the representation. Sections 5 to 8 present three 
classes of infinite boolean functions of increasing complexity corresponding to 
further refinements of the representation by BDGs. Only the last one is closed 
by all boolean operations, but the first two are representable by efficient unique 
BDGs, which, when used to represent finite functions, give classical BDDs. Both 
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classes have results on approximation properties that can easily be exploited in 
abstract interpretation [9], a fact that is very promising on their usefulness in 
program analysis. 



2 Boolean Functions, Vectors and Words 

Let B {true, false} be the set of boolean values. denotes the set of 
boolean vectors of size n. A finite boolean function is a function of S" — 1 B. 
B'^ denotes the set of infinite boolean vectors. An infinite boolean function is a 
function of B‘^ — i B. 

A boolean function / is entirely characterized by a set of vectors, which is 
defined as (u | f{u) = true}. Thus, we will sometime write u G f for f{u) = 
true. This characterization is interesting to distinguish between the variables 
of the functions and their position in the function, which we call its entries. 
If f : B'^ ^ B, then the entries of / are the integers between 0 and n — 1. If 
/ : B^ ^B, then the entries of / are IM, the set of all natural numbers. We write 
for the projection of u on its value. Given a set I of entries, M(/) denotes 
the subvector of u with I as its set of entries. The restriction of / according to 
one of its entries i and to the boolean value b is denoted f\i^-b and is defined 
as the set of vectors: { u | 3v G f, U(j) = b and V({ ^ | = m}. A special case is 

when i = 0. In this case, we simply write f{b). We extend this notation to any 
vector whose size is smaller than the vectors in /. 

It is sometime convenient to consider a boolean vector as a word over B* . It 
allows the use of concatenation of vectors. If m is a finite vector and v a vector, 
the vector u.v corresponds to the concatenation of the words equivalent to the 
vectors u and v. The size of a vector u is written |m|. The empty word is denoted 
£. We define formally the notation f{u): 

f{u) = (u I u.v G /} \i f ■. B‘^ -^B or f ■. B^ ^B and 1^1 < n 

So, if for all vectors / is false, we can write / = 0. 

We extend the concatenation to sets of vectors: for example, if / is a set of 
vectors and u a vector, u.f = {u.v \ v G /}. 

To improve the clarity of the article, we adopt the following conventions: 
a, b, c represent boolean values, 
e, /, g represent boolean functions, 

u,v,w represent vectors. In a context where we have infinite vectors, represent 
finite vectors, 

a, (3 , 7 represent infinite vectors, 
x,y,z represent variables (or entry names), 

r, s,t represent binary trees, 

i,j,k,n represent natural numbers. 

In the description of vectors, to reduce the size of the description, we will 
write 0 for false, and 1 for true. 
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3 Entry Names 

In classical BDDs, boolean functions are described by boolean expressions using 
variable names corresponding to the different entries of the BDDs. The Binary 
Decision Diagrams are ordered, so that the ordering imposed on the variables 
follows the order of the entries of the function. The variable name of rank i is 
called the name of the entry i. In many cases, the variable names correspond to 
some entities related by the boolean function. A consequence is that we can have 
the same information while changing the ordering on the variable names and the 
boolean functions that bind them (so that a given entry always corresponds to 
the same variable name, and the ordering on the variable names is the same as 
the entries ordering). Thus the different optimizations depend on the choice of 
this ordering [10,12]. 



3.1 Equivalent Entries 

In the case of infinite functions, we cannot assign a different variable name to 
all the entries of the boolean function, because we want a finite representation. 
The idea is that the set of variable names associated with a given function is 
finite and to achieve that, some entries can share the same name. However, not 
every entry can share the same name: to share the same name, two entries must 
be in a sense equivalent. 

A permutation is a bijection of IN — >■ IN. If / C IN, a permutation of the 
entries in / is a permutation a such that Vi ^ I, cr(z) = i. A permutation a 
defines a function from vectors to vectors, defined by : = U(^cr{i)) for 

all i entry of u. 

Definition 1 (Equivalent entries). Let f be a boolean function. The entries 
contained in the set / C IN are equivalent if and only if whatever the permutation 
a of the entries in I, whatever the vector u G dom{f), f{u) = f{~^{u)) 

There are two ideas underlying the definition of equivalent entries: the re- 
striction according to any equivalent entry is the same, so f\x<^b, where x is an 
entry name, is not ambiguous; and whatever the ordering of the equivalent en- 
tries, the function is the same. The following example shows that this property 
imposes that we allow infinite permutations. 



Example 1. Consider the infinite function that is true on any infinite vector con- 
taining two consecutive Os infinitely many times. This function contains (001)“ 
but not (01)“. If we only considered finite permutations, that is permutations 
generated from finite exchange of entries, then all entries of this function are 
equivalent. But there is an infinite permutation that transforms (001)“ into 
( 01 )“: 



(t(3 X k) = 2 X k 
(t(3 xA:+1)=4xA: + 1 
(t(3 xA: + 2)=4xA: + 3 



0 0 1 



0 0 10 0 ^,1 0 , 0 ,„„v • 

1 0 VO l^l^O 1^0 l" ••• 



0 



1 0 
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The meaning of this substitution is that, if a function with all entries equivalent 
contains (001)“ then there is a way of giving the values of (01)“ such that it is 
accepted by the function. Concerning our function, it is consistent to forbid the 
equivalence of all entries. 

As an immediate consequence, we have the following properties for functions 
/ where all entries are equivalent: 

Proposition 1. Let v be a word, and b a boolean value in v. Then & f if 
and only if (u&)“ € /. 

Proposition 2. Let a be a word where a boolean value b appears infinitely often. 
Then a € f if and only ifb.a G /. 

Proof. In both cases, we have an infinite permutation of the entries that trans- 
forms the first vector into the other one. In the second case, we just have to shift 
the b’s of b.a to the right, each b going to the entry of the next one. In the first 
case, we keep shifting by one more b for each v.b. □ 

3.2 Equivalent Vectors of Entries 

In order to accept more functions, we extend the notion of equivalent entries to 
equivalent vectors of entries. We just consider as one entry a whole set of entries 
of the form {tGlN| k < i < k + n}. It will allow the iteration over whole vectors 
of entries. The set of equivalent entries is described by a set / of indexes and a 
length n such that Vfc G I, Vz such that k<i<k + n, i^L. A substitution a 
over such a set is such that Vfc G I, a{k) G I, and Vz < n, a{k + i) = a{k) + i. 
For all other numbers j, cr(j’) = j. 

Definition 2 (Equivalent Vectors of Entries). Let f be a boolean function. 
The vectors of entries contained in the set L with length n are equivalent if 
and only if whatever the permutation <j of the entries in L, whatever the vector 
u G dom(f), f{u) = f{~^{u)) 

Two entries can have the same name if and only if they are at the same 
position in a set of equivalent vectors of entries. We suppose in the sequel that 
every boolean function comes with such a naming of the entries. We will write 
namej(z) for the name of the entry z of function /. To simplify the presentation 
and the proofs, we will only consider simple equivalent entries in the sequel of 
the article, but the results extend easily to equivalent vectors of entries. 

3.3 Equivalent Entries and Redundant Choices 

Redundant choices are used in BDDs to reduce the size of the representation. 
The good news is that giving the same name to equivalent entries is compatible 
with the elimination of redundant choices. There is a redundant choice at a 
subvector zz of / if and only if /(zz.O) = f{u.l). 
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Theorem 1. Let f be a boolean function, and u be a vector such that f{u.O) = 
f{u.l). Then, whatever v such that namef(\u.v\) = namef{\u\), f{u.v.O) = 
flu.v.l). 

Proof. V = a.w. We have f{u.a.w.O) = f{u.0.w.a) because of the equivalence of 
the entries. f{u.0.w.a) = f{u.l.w.a) by redundancy of the choice, and f{u.l.w.a) 
= f{u.a.w.l) by equivalence of the entries. Thus f{u.v.O) = f{u.v.l). □ 

3.4 Periodicity of the Entries 

In order to be finitely representable, we impose some regularity to the entry 
names. The entry names are said to be periodic if and only if there is a period k 
on the entry names, that is, for all i, the name of t + fc is the same as the name 
i. The entry names are said to be ultimately periodic if, after some point, they 
are periodic. Throughout the paper, we will assume that the entry names are 
ultimately periodic. 



4 Decision Trees 

4.1 Finite Decision Trees 

BDDs are based on decision trees. A decision tree is a structured representation 
based on Shannon’s expansion theorem: / = 0./(0) U l./(l). This observation is 
the basis of a decision procedure: to know whether a given vector is in a function, 
we look at its first value. If it is a 0, we iterate the process on the rest of the 
vector and /(O), and if it is a 1, we iterate on the rest of the vector and /(I). 
This procedure can be represented by a binary tree labeled by the entry names, 
and with either true or false at the leaves. 

We define a labeled binary tree t as a partial function of {0, 1}* — iL where 
L is the set of labels, and such that whatever u.v £ domff), u £ dom{f). The 
subtree of t rooted at u is the tree denoted t[„] of domain {u | u.v £ domft)}, 
and defined as t[u](u) '= t{u.v). The decision tree defined by a boolean function 
/ : is the binary tree of domain Ufc<n{0) such that if |u| = n, then 

t{v) = f{v), and if |u| < n, t{v) = name/(|u|). 

Example 2. Let / = {000,011,111}. If we associate the variable names x to 
entry 0, y to entry 1 and z to entry 2, then / can be described by the formula: 
{y f\ z)y (- 1 X A-'y A ->z). The decision tree for / is displayed in Fig. 1. 



4.2 Semantics of the Decision Trees 

The decision tree of a boolean function is used as a guide for a decision process 
that decides the value of the function on a given vector. If t is the decision tree 
associated with the function /, then to decide whether the vector u is in /, we 
“read” the first value of u. Say u = b.v. If 6 is a 0, we iterate on v, t[oj, and 
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Fig. 1. The decision tree, the decision tree with shared subtrees, and the BDD. 



if it is a 1, we iterate on v, The entire decision process goes through the 
tree, following the path defined by u, and the result of the decision process is 
the value of the leaf, t{u). Binary Decision Diagrams are based on two remarks: 
first we can represent any tree in a form where equivalent subtrees are shared (a 
directed acyclic graph), and second if a choice is redundant, then we can jump it. 
The second remark modifies slightly the decision process: we must have separate 
information on the entries of a function. For example, we can have the sequence 
of the entry names, or equivalently if all entry names are different, an ordering 
on the entry names. In this way, we can keep track of the current entry that is 
read from u. If it is “before” the entry named tie), we can skip the first value of 
u and iterate on v, t. 

Example 2 (continued) . The two steps of sharing equivalent subtrees and elim- 
inating redundant nodes are shown in Fig. 1. The decision process on the vector 
101 can reach false after reading the first 1 and the first 0. The entry names 
of the BDD are represented by cc < y < z, or equivalently by xyz. If we realize 
that the entry 1 and the entry 2 are equivalent, we can give the same name y to 
both entries. Then the BDD becomes: 

X 

V K 




with entry names described as xyy. It is easy to see that this description will 
lead to a more efficient algorithm to compute /| 2 <-o which correspond to /|z<_o 
with the entry names xyz, and to /|y<-o with the entry names xyy. 

It is an established fact [4], that given a boolean function and a naming of 
the entries with all names different, such a representation is unique, leading to 
tautological equivalence testing. From Theorem 1, we can add that this repre- 
sentation is still unique if the naming of the entries respect the equivalences of 
the entries. 
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4.3 Infinite Trees 

To extend this definition to infinite boolean functions, there are two problems: for 
all a G /, Of ^ dom{t), because binary trees domains are limited to finite words. 
This is the problem of the infinite behavior of the function. We can represent 
t{v) for all V prefix of a vector in /, but then the tree is infinite. This is the 
second problem, treated in this section: how to represent an infinite tree. 

As we have seen, a BDD is a decision tree on which we have performed two 
operations: first the sharing of equivalent subtrees, second the elimination of 
redundant choices. In fact, following this process would be too inefficient, and 
when manipulating BDDs, these operation are performed incrementally: each 

X . . ^ 

time we build a tree , we return t, and each time we build another tree 7. v ^ 

f f to ti 

we first look if the tree has already been encountered, through a hash table for 
example, and if it is the case, we return the tree already encountered, if not we 
add it in the table. 

The same operations, albeit a little more complex, can be performed to repre- 
sent an infinite tree with maximal sharing of its subtrees. First we only represent 
regular trees, that is trees with a finite number of distinct subtrees. The only 
difference with finite trees, which are represented by directed acyclic graphs, 
is that infinite trees are represented by directed graphs that do contain cycles. 
The added complexity introduced by the cycles is not intractable, and efficient 
incremental algorithms can be devised. The ideas are the following: when we are 
not in a cycle, the algorithm is the same as in the finite case. When we isolate 
a strongly connected subgraph (a “cycle”), we first see if this cycle is not the 
unfolding of another cycle that is reachable from the subgraph. If it is the case, 
we return this other cycle (we fold the subgraph on the cycle). If not, we reduce 
the subgraph to an equivalent one with maximal sharing, and then we compute 
unique keys for the subgraph, so that we can see if it had already been encoun- 
tered, or so that we can recognize it in the future. We have one key for each 
node of the subgraph. The detailed algorithms and their proofs can be found in 

[ 17 ]. 

Examples of infinite trees represented this way will be displayed in the next 
sections. The trees will be progressively enriched so that the BDGs represent 
wider classes of infinite functions. 

5 Basic BDGs: Open Infinite Functions 

If we just extend BDDs with the possibility of sharing entry names and the 
possibility of using infinite trees, we can already represent open infinite functions. 
Our first restriction though, is that the decision tree is representable, that is, it 
is regular. 

Definition 3. Let f be a boolean function, f is said to be prefix regular if and 
only if the number of distinct f{u) is finite. 
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Because we will have only one possible representation for a given function, and 
f{u) corresponds to t{u) if t represents /, it means that t is regular. 

5.1 Open Functions and Closed Functions 

Definition 4 (Open Function). Let f : f is said to be open if and 

only if f is prefix regular and: 

Va G /, 3u such that a = u.f3 and f{u) = B^ 

Recall that f{u) = B^ means that whatever 7 , f{u.j) = true. This definition 
corresponds to a choice on the meaning of an infinite decision tree labeled by 
entry names and with true and false as leaves. If during the decision process 
of a vector, we reach a false, then the result of / on the vector must be false. 
If we read a true, then the result of / on the vector must be true. If u is the 
beginning of the vector that lead to true, then every infinite vector beginning 
with u will be in /, so f{u) = B^^ . The last case is when we never reach true 
nor false. The choice is that such vectors are not in the function. By choosing 
that such vectors are in the function, we would have represented dually the 
closed functions. So, the dual of the properties on open functions apply to closed 
functions. 

Because we chose that cycling in the decision process is rejecting, there is 
one new source of non-uniqueness that is not taken care of by simply sharing 
every subtree of the decision tree. We must also replace by false every cycle 
from which no true is reachable. This is easily performed while treating cycles 
in the representation of regular trees. With this treatment, we still have a unique 
representation for open functions. 

Example 2 (continued). The function / can be extended to the function g : 
B'^ — >■ B, defined as: g{u.a) = true if m G /. Then g is represented with the 
same diagram as /, with an additional entry name z', and the entry names are 
xyy{z'Y. 

Example 3. Let / be true on a if and only if a contains at least one 1. Every 
entry of / is equivalent, so its entry names can be described as . Since / is 
open, it can be represented by the following graph: 




true 



5.2 Boolean Operators 

Theorem 2. Let f and g be two open functions. Then the functions f A g and 
fVg are open. Moreover, if {fi)ieiN is a family of open functions, then VigiN /* 
is an open function. 
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/ A g{a) = f{a) A g{a), and / V g{a) = f{a) V g{a). So, if we consider / and g 
as sets of vectors, f A g is the intersection of / and g, and f V g is the union of 
/ and g. 

Proof. Let a £ f \/ g. There is a m such that a = u.f3, and either f{u) = or 
g{u) = . In any case, / V g{u) = B'^ . If a G / A 5, there is u and v such that 

a = u.p, a = W.7, and f{u) = B‘^ and g{v) = B^ . If |u| < |w|, then v = u.w. So 
f{v) = B'^. So, / A g{v) = B^. If a £ VigiN there is a least u prefix of a 

such that there is a i, fi{u) = B‘^. We have VigiN /*('*^) = B‘^ . □ 

Dually, the finite union of closed functions is a closed function, and the infinite 
intersection of closed functions is a closed function. 

Corollary 1. Whatever the boolean function f, there is a greatest open function 
contained in f , and there is a least closed function containing f. 

Algorithmically, it is easy to compute the and or the or of two open func- 
tions. The algorithms are the same as in the finite case [5], with the possibility 
of memoizing [18], except that we must take care of cycles. When a cycle is en- 
countered, that is when we recognize that we already have been through a pair 
of subtrees (s,t), we build a loop in the resulting tree. 

Open functions are not closed by negation: the negation of the function that 
is true on all vectors containing at least one 1 is the function containing only 0“. 
Such a function is not open, because the only infinite behavior that is possible 
for an open function is trivial. In order to be more expressive, we introduce more 
infinite behaviors. 

6 More Infinite Behaviors 

To allow more infinite behaviors, we need to have more than one kind of loop, so 
that in some loop it is forbidden to stay forever, and in some others, we can. We 
introduce a new kind of loop: loops over open functions. This new kind of loop 
defines a new set of infinite behaviors, defining what we call iterative functions. 
Iterative functions are functions that start over again and again infinitely often. 
Thus entry names will have to be periodic. 

Definition 5 (Iterative Function). Let f : B^ ^B. f is said to he iterative 
if and only if the entry names of f are periodic, and there is an open function g 
such that for all a in f, there is an infinite sequence of vectors such that 

a = uq.ui . . .Ui . . . and each ut has the minimum length such that g{ui) = B‘^ 
and namcf{\ui\) = namef{0). We write f = f2{g). 

Hence an iterative function is represented by an open function. We will use 
the decision tree of the open function to represent the iterative function. But in 
the context of iterative functions, the decision tree will have a different meaning, 
corresponding to a slightly different decision process. The decision process is the 
following: we follow the decision tree in the path corresponding to the vector, 
but when we reach a true, we start again at the root of the tree. To be a success, 
the decision process must start again an infinite number of times. 
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Example 4 (Safety). 



X 




true false 



represents the function that is true on 0“ only. 



Example 5 (Liveness). 

true 



represents the function that is false only on those vectors 
that end with 1“. That is, the function is true on any 
vector containing an infinite number of O’s. 



Example 6 (Fairness). 

X 




represents the function that is true on any vector con- 
taining an infinite number of O’s and I’s. 



true 



These examples show that iterative functions can be used to represent a 
wide variety of infinite behaviors. Note that 0 and are at the same time open 
and iterative. Another remark: the equivalence of the entries restraining the use 
of shared entry names is only applied to the iterative function, not the open 
function that represents the iterative function. 



Theorem 3. An iterative function is prefix regular. 

Proof. If / = 17 (g), g is open, so prefix regular. If g{u) = g{v), then f(u) = f{v). 
So the set of distinct f{u) is smaller than the set of distinct g{u). Thus, / is 
prefix regular. □ 



Many possible open functions can represent the same iterative function: 



Example 6 ( continued) . The function that is true on every vector with an infinite 
number of O’s and I’s could also be represented by the following open function: 




true 



In fact, there is a “best” open function representing a given iterative function. If 
we always choose this best open function, as the representation of open function 
is unique, the representation of iterative function is unique too. 

Theorem 4. Let f he an iterative function. The function g which is true on the 
set {u.a\ u‘^ € f and f{u) = /} is the greatest (for set inclusion) open function 
such that f = Q{g) 
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In order to simplify the proofs, we will suppose that all entries of / are equivalent, 
so that there is only one entry name, and we can get rid of the test name/(|u|) = 
name/(0). To reduce the problem to this case, we can use a function over larger 
finite sets described by where fc is a period of the entry names of /. 

Lemma 1. Let u be a vector such that g{u) = B'^ and for all v prefix of u, 
g{v) yf B‘^ . Then u'^ & f and f{u) = /. 

Proof (Lemma 1). Whatever a, u.a G g. Because of the minimality of u with 
respect to the property g{u) = B‘^, for each a, there is a u prefix of a such that 
(u.v)^ G / and f{u.v) = f. Let 6 be a boolean value in u. We choose a = 
There is an I such that (u.a*)“ G /. Because all entries of / are equivalent, by 
Proposition 1, G /. Let a G /. a is infinite, so there is a boolean value a 
that is repeated infinitely often in a. But there is an I such that fiy.a}) = /. So 
v.aKa G /, and by the equivalence of all entries (Proposition 2), v.a G /, which 
means that a G f{v). Conversely, if a G f{v), v.a} .a G /, and so a G R. Thus 
/ = 

Proof (Theorem 4)- <7 is an open function, because whatever the element a of 
g, there is a u prefix of a such that V/3, u./3 G g. / is iterative, so there is an 
open function g' such that f = Tl (g')- G g', there is a u prefix of a such that 
g'(u) = Let uo be the least such u. Because / = L2{g'), uq‘^ is in /, and 
/(a) = /. So a G g, which means that g' C g. To prove the theorem, we just 
have to prove that / = I? (g). We will start by / C C (g), then prove 12 (g) C /. 

Let a G /. We suppose a ^ C(g). If there is a m prefix of a such that 
g{u) = B‘^, let Uo be the least such u. a = uo-fi. Then (3 ^ il (g), but /(uo) = /> 
so /3 G /. So we can iterate on (3. This iteration is finite because a ^ 17 (g), so we 
come to a point where there is no u prefix of a such that g{u) = B^ . But a G f, 
so there is a uq prefix of a such that g'{uo) = B^ , and Uq G f and /(mq) = f, 
and so g(uo) = B'^, which contradicts the hypothesis. Thus f C O (g). 

Let a G 17(g). Let (Mi)igiN be the sequence of words such that g{ui) = 
a = Uq.Ui .. .Ui .. . and Ui minimum. Some Ui appear infinitely often in a, and 
some other Ui appear only finitely often in a. Hence there is a permutation of 
the entries such that the result of the permutation on a is v.(3, where v is the 
concatenation of all Ui that appear finitely in a (times the number of times they 
appear), and (3 is composed of those Ui that appear infinitely in a. By definition 
of 17 (g), /? G 17 (g), and because all entries of / are equivalent, v.(3 is in / if and 
only if a is in /. But whatever m, ffaf) = f (see the lemma). So f{v) = f. And 
so a is in / if and only if (3 is in /. Either (3 contains a finite number of distinct 
Ui, or an infinite one. 

If (3 contains a finite number of distinct Ui, we call them {vi)i<m- Then there 
is a permutation of the indexes such that the result of the permutation on (3 is 
(uq.wi . . . Um)“. We know that G / (see the lemma), and for all i, f{vi) = /, 
so Vo-Vi . . . Vm-ifvmT G f. We Call 7 = vq.vi . . . Vm-i-{vm)‘^. Becausc 7 G /, 
there is a sequence (uOigiN such that g'(u') = 7 = Uq.u[ . . . u' . . . and u' 

minimum. So there is a j such that Uq.u[...u'j = Vq.Vi . . .Vm-i-Vm^-w with 
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w prefix of Vm- Whatever i, {u'q.u[ . . G /, because / = f? ((?')• So, by 

Proposition 1, (uq-Ui . . .Um)“ G /. This in turn means that /3 G /. 

If /3 contains infinitely many distinct Ui, we call them (ui)ig]N. Necessarily, 
there is infinitely many O’s and I’s in (3. So we have two Vi, wq and wi such that 
Wo contains a 0 and wi contains a 1. As /? is composed of 0 and 1, there is a 
permutation of the entries that transforms l3 in (wq.wi)‘^ . So we are back to the 
problem with [3 containing a finite number of Ui. 

Thus, /? G / whatever the case, which proves that a G /. We started from 
a G 12(g), so f2(g) C /. Because we already proved / C 12(g), we have / = 
12(g). □ 



7 BDGs with rec Nodes: cj-deterministic Functions 

We combine a finite behavior with infinite behaviors to express a wide variety 
of infinite functions. We call these functions w-deterministic because we restrict 
the number of possible infinite behaviors at a given u to at most one iterative 
function. 



Definition 6. Let f : B. f is u>- deterministic if and only if f is prefix 

regular and Vu, 12 {{v.a \ u.v'^ G / and f{u.v) = /(«)}) C f{u). 



7.1 The Decision Tree 

If the set {r> I u.v‘^ G / and f{u.v) = f{u)} is not empty, then it is possible, in 
the decision process, that we enter an infinite behavior. It must be signaled in the 
decision tree. To this end, we introduce a new kind of node in the decision tree, 
the rec node. The rec node signals that we must start a new infinite behavior, 
because before this node, we were in fact in the finite part of the function. The 
rec node has only one child. In the graphical representation, we will sometimes 

X 

X 

write i for rec . After a rec node, we start the decision tree representing the it- 

t 

erative function. We know that when this decision tree comes to a true, we must 
start again just after the previous rec node, false nodes in the decision tree 
are replaced by the sequel of the description of the w-deterministic function. As 
iterative functions of the w-deterministic function are uniquely determined, and 
their representation is unique, the decision tree of the w-deterministic function 
is unique. 

Note that open functions are w-deterministic. Their representation as an w- 
deterministic function is the same as in the previous section, but with a rec 
preceding every true. It means also that the restriction of this representation to 
finite functions give the classical BDD, except for the rec preceding the true. 
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7.2 The Semantics of the Decision Tree 



The semantics of the decision tree is defined in terms of a pseudo-decision process 
(it is not an actual decision process because it is infinite). The decision process 
reads a vector and uses a stack S and a current iterative tree, r. At the beginning, 
the stack is empty and r is the decision tree. When we come to a true node, we 

rec 

stack it and start again at r. When we come to a i- node, r becomes t and we 

empty the stack. If we come to a false node, we stop the process. The process 
is a success if it doesn’t stop and the stack is infinite. 



Example 1. 

X 




true false 



X 

when we read a 0, the current iterative tree becomes ^ ^ 

true false 

If we read a I after that, we stop on a failure, and if we read 
a 0, we stack a true and start again with the same iterative 
tree. So, after a 0, we can only have 0“. After a 1, the iterative 

X 

tree becomes ^ , and this time, we can only have 1“. 

false true 



So this function is {0“, 1“}. 



Example 8. 

X 



y 

true 






during the decision process, the iterative tree never changes. 
When we read a 0, we stack a true and start again. But each 
time we read a 1, we empty the stack. So the only vectors that 
stack an infinite number of true are the vectors ending by 0“ . 



7.3 Boolean Operators 

Theorem 5. Let f and g be two oj- deterministic functions. Then f A g is uj- 
deterministic. 

The algorithm building the decision tree representing f A g identifies the loops, 
that is we come from a (t, u), which are subtrees of the decision trees representing 
/ and g, and return to a (t, u). If in such a loop, we have not encountered any new 
true in any decision tree, we build a loop, if one decision process has progressed, 
we keep building the decision tree, and when both have been through a true, 
we add a true in the intersection. 

w-deterministic functions are not closed by union. As they are closed by 
intersection, it means that they are not closed by negation either. 

Example 9 (Impossible Union). Let /i be the set of all vectors with a finite 
number of I’s, and /2 the set of all vectors with a finite number of O’s. /i and 
/2 are represented by: 




true true 

Let / = /i V/ 2 . The set ({ u.a | G / and f{u) = /}) is the set of all vectors, 
but (01)“ ^ /, so / is not w-deterministic. 
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7.4 Approximation 

Theorem 6. Whatever f prefix regular function, there is a least (for set inclu- 
sion) ui- deterministic function containing f. 

The process of building the best w-deterministic function approximating a prefix 
regular function consists in adding the iterative functions defined by {v.a\ u.v^ G 
/ and f{u.v) = f{u)} to f{u). This process starts at e and keeps going for each 
false node in the representation of the iterative function, augmenting / at this 
point. The process is finite because / is prefix regular. 

8 BDGs with Subscripts: Regular Functions 

Regular functions are the closure of w-deterministic functions by union. The idea 
is to allow a finite set of iterative functions to describe the infinite behavior at 
a given point in the function. 

Definition 7. Let f : B. f is said to be regular if and only if f is prefix 

regular and there is a finite set of non-empty iterative functions iter{f) such that 
Va G f, 3u, 3g G iter{f), a G u.g and g C f{u). 

Such functions are called regular because of the analogy with w-regular sets 
of words of Biichi [6] . The only restriction imposed by the fact that we consider 
functions lies in the entry names, namely the entry names must be ultimately 
periodic to be finitely representable. 

Theorem 7. A function f is regular if and only if its entry names are ultimately 
periodic and the set of words in f is co-regular in the sense of Biichi. 

The idea is that open functions define regular languages. If U is the regular 
language defined by an open function, then the associated iterative function is 
[/“. The idea of the proof of the theorem is that an w-regular language can be 
characterized as a finite union of U.V^ , with U and V regular languages. 

Corollary 2. If f and g are regular functions, then f A g, f \/ g and -■/ are 
regular functions. 

It is an immediate consequence of the theorem, the closure properties of co- 
regular languages, and the closure properties of the fact that the set of entry 
names is ultimately periodic. 

Note that the set of regular functions is strictly smaller than the set of co- 
regular languages: 

Example 10. The set {0, 11}“ is an w-regular language, but not a regular func- 
tion. Suppose two entries i < j are equivalent. Then 0-1“ ^110“ is in the function, 
so 0*~^10^“*10“ is in the function too, because of the equivalence of entries i 
and j. 
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It is possible to define a decision tree that represents a regular function by 
subscripting some labels with a set of indexes corresponding to the iterative 
functions associated with the regular function. The meaning of this subscript is 
that we stack a true for each iterative function in the set of indexes. The problem 
is that this representation depends on the indexing of the iterative functions, and 
there is no canonical way of determining the set of iterative function that should 
be associated with a given regular function. 

9 Conclusion 

To achieve the representation of infinite functions, we presented a new insight 
on variable names which allows the sharing of some variable names. This sharing 
is compatible with every operation on classical BDDs, at no additional cost. It 
is even an improvement for classical BDDs, as it speeds up one of the basic 
operations on BDDs, the restriction operation. 

We presented three classes of infinite functions which can be represented by 
extensions of BDDs. So far, the only extension that allowed the representation 
of infinite function was presented by Gupta and Fisher in [11] to allow inductive 
reasoning in circuit representation. Their extension corresponds to the first class 
(open functions), but without the uniqueness of the representation, because the 
loops have a name, which is arbitrary (and so there is no guarantee that the 
same loop encountered twice will be shared). 

Our representation for open functions and w-deterministic function have been 
tested in a prototype implementation in Java. Of course, this implementation 
cannot compete with the most involved ones on BDDs. It is, however, one of the 
advantages of using an extension of BDDs: many useful optimizations developed 
for BDDs could be useful, such as complement edges [3] or differential BDDs [1]. 
This is a direction for future work. Another direction for future work concerns 
the investigation over regular functions. These functions are closed by boolean 
operations, but we did not find a satisfactory unique representation with a de- 
cision tree yet. We believe the first two classes will already be quite useful. For 
example the first class (open function) is already an improvement over [11], and 
the second class (w-deterministic) can express many useful properties of tempo- 
ral logic [15]. This work is a step towards model checking and static analysis of 
the behavior of infinite systems, where properties depending on fairness can be 
expressed and manipulated efficiently using BDGs. 

Acknowledgments: Many thanks to Patrick Gousot for reading this work 
carefully and for many useful comments. Ian Mackie corrected many English 
mistakes in the paper. 
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Abstract. This paper presents a specializer and a binding-time analyzer 
for a functional language where expressions are allowed to be used as 
both static and dynamic. With both static and dynamic expressions, 
we can statically access data structures while residualizing them at the 
same time. Previously, such data structures were treated as completely 
dynamic, which prevented us from accessing their components statically. 
The technique presented in this paper effectively allows us to lift data 
structures which was prohibited in the conventional partial evaluators. 
The binding-time analysis is formalized as a type system and the solution 
is obtained by solving constraints generated by the type system. We 
prove the correctness of the constraint solving algorithm and show that 
the algorithm runs efficiently in almost linear time. 



1 Introduction 

Given a function /(x, y) and some of its argument x = a, program specializa- 
tion (or partial evaluation) transforms / into a more efficient specialized pro- 
gram fa{y) by reducing expressions in / using the information on its known 
argument [14]. Because expressions that depend only on the known argument 
can be performed at specialization time, the execution of the resulting program 
fa{y) is usually (far) more efficient than the execution of the original function ap- 
plied to the same argument: f{a,y). The specialization technique is widely used 
not only for program optimization [8] but also for the construction of compilers 
and compiler generators from interpreters[10,14j. It is also applied to runtime 
code generation [9, 17]. 

Specialization is different from normal execution in that it involves expres- 
sions whose value is unknown at specialization time. They are called dynamic 
expressions.^ The expressions with known values are called static. Since static 
expressions can be safely executed at specialization time whereas dynamic ex- 
pressions can not, a specializer needs additional facilities to distinguish them. 
There are mainly two approaches to cope with this: online and offline. 

An online specializer[15] determines binding-times (i.e., if an expression is 
static or dynamic) during specialization. Since the decision is made according to 

* 7-3-1 Hongo Bunkyo-ku 113-0033 Japan (http: //www. is . s .u-tokyo . ac . jp/~asai) 
^ Dynamic expressions are said to be residualized in the specialized (or residual) 
program, so that they are executed later when unknown arguments are available. 
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the actual values used for specialization, it leads to more accurate binding-times 
and hence better specialization. 

An offline specializer[14], on the other hand, determines binding-times of 
expressions at a binding-time analysis (BTA) phase performed prior to special- 
ization. Because the binding-time analysis does not use the actual values for the 
analysis, the result usually contains some kind of approximation. However, since 
all the binding-times are known at specialization time, specialization process it- 
self becomes much simpler and faster. This enables us to practically self-apply 
the specializer to obtain compilers and compiler generators from interpreters. 

In this paper, we demonstrate how to remove one of the existing limitation 
of offline specializers, i.e., limitation caused from a static data structure in a 
dynamic context. When a data structure (such as a cons cell and an abstraction) 
occurs in a dynamic context, i.e., when a data structure needs to be residualized, 
the conventional offline specializer makes the binding-time of this data structure 
completely dynamic, even if it is used statically somewhere else. For example, 
consider the following term P: 

P = (Ax. cons (car x) x)@(cons 1 2) . 

When this term is specialized using online specializers, the result naturally be- 
comes cons 1 (cons 12). When we use offline specializers, however, the binding- 
time analysis classifies x(= cons 1 2) dynamic because it is residualized in the 
final result. (Otherwise, the residual program would contain specialization-time 
values rather than program texts: cons 1 (1 . 2).) Consequently, car x becomes 
dynamic, and the original term itself is returned as the result of specialization 
without performing any specialization.^ 

The problem can be avoided if we could lift data structures: we transform 
the specialization-time value (1-2) into its program text cons 1 2. In the conven- 
tional specializers, however, lifting is allowed only for constants and not for data 
structures because it leads to code duplication.^ Furthermore, it is not obvious 
how to lift closures. 

The problem here is that x is used as both static and dynamic. Even though 
it is used statically at car x, it becomes dynamic because it is residualized. 
The solution we take here is to carry both the residualized form and the static 
value for such data (after residualizing it in a let-expression). In the subsequent 
specialization, we take the dynamic part for residualization and the static part 
for reduction. This technique enables us to access statically the components of 
residualized data structures. It effectively allows us to lift data structures. 

This flexibility does not come for free, however. Carrying both values spoils 
one of the good properties of offline specialization: static computation can be 
executed efficiently using standard evaluators. To minimize the overhead of car- 
rying both values, we use data structures with both values only when necessary. 

^ The assignment of cons 1 2 into x is performed. However, a let-expression is inserted 
to avoid code duplication to produce let x = cons 1 2 in cons (car x) x. 

® For example, specialization of (Ax. cons (car x) (cons x x))@(cons 1 2) will duplicate 
cons 1 2 as in: cons 1 (cons (cons 1 2) (cons 12)). 
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To this end, we handle three kinds of binding-times: static, dynamic, and both. 
Static expressions are reduced efficiently as in the conventional specializers. Dy- 
namic expressions are residualized in let-expressions so that they will never be 
duplicated. Expressions that are reduced statically but are also residualized will 
carry both static and dynamic values with some extra overhead. 

In this paper, we present a monovariant binding-time analysis which identifies 
expressions that are actually used as both static and dynamic. It is formalized 
as a type system, and the result of binding-time analysis is obtained by solving 
constraints generated by the type system. The analysis is efficient: its time com- 
plexity is almost linear to the length of a program, achieving the same complexity 
as the conventional binding-time analysis. 

A polyvariant binding-time analysis is a more general approach to give mul- 
tiple binding-times to the same expression. However, it is known to be quite 
expensive [2] . Rather than employing expensive analyses, we stay in a monovari- 
ant framework to obtain more efficient almost linear time algorithm. Whether or 
not it is feasible to use a general polyvariant analysis in this situation is outside 
the scope of this paper. 

The paper is organized as follows. After presenting our language in Sect. 2, 
we show a specializer for it in Sect. 3. Binding-time types are defined in Sect. 4 
and typing rules for annotated terms are presented in Sect. 5. In Sect. 6, these 
typing rules are transformed into typing rules for unannotated terms, giving con- 
straint generation rules. To solve the generated constraints, binding-times are 
first interpreted as a pair in Sect. 7. Two-phase constraint satisfaction algorithm 
is presented in Sect. 8, and its correctness is discussed in Sect. 9. Its time com- 
plexity is shown to be almost linear time in Sect. 10. Related work is discussed 
in Sect. 11 and the paper concludes in Sect. 12. 

2 The Language 

The language we consider is a A-calculus with data structures (pairs), constants, 
and primitive operators (of which we include only arithmetic addition here). The 
syntax is defined as follows: 

L = X \ Xx. L I L@L I cons L L \ car L \ cdr L \ c \ L + L . 

The language does not contain recursion explicitly, but the framework presented 
here can be easily extended to cope with recursion. 

Given a term L in the above syntax, a binding-time analyzer produces an 
annotated term whose syntax is defined as follows: 

M = X \ X^x. M I X^x. M I X^x. M \ M@^M \ M@^M \ cons^M M 
I cons^MM I cons^ M M \ car^ M \ car^ M \ cdr^ M \ cdr^ M 
I c^ \ \ c^ \ M M \ M M \ Mif I Mig . 



X^x. M and X^x. M are standard static and dynamic abstractions, respectively. 
In addition, there is a third kind of abstraction X^x. M. X^x. M is an abstraction 
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which is used as both static and dynamic. {B for ‘both’.) It becomes static when 
coerced into a static abstraction using J,^. Likewise, it becomes dynamic when 
coerced into a dynamic abstraction using J,®. We call these two operators 
and J,£) coercion operators. In contrast to abstraction, application is either static 
or dynamic. To apply X^x. M, it has to be coerced into a static or dynamic 
abstraction beforehand using or 

Pairs and constants are also split into three kinds: static, dynamic, and both. 
The same coercion operators are used to take its static or dynamic part. 

Note that there exists no coercion operator from S to D. This is the same as 
the conventional partial evaluator where coercion from a static data structure to 
a dynamic one is not permitted. As for constants, we will see in the next section 
that the standard lift operator can be used. 

3 Specializer 

Figure 1 shows a specializer for an annotated term M. In the figure, a variable 
with a superscript * indicates that it is a fresh dynamic variable. Expressions with 
a superscript D (in the righthand side of the figure) are residualized expressions. 
Among them, let'^w* = A in i? is a residualized let-expression which is used to 
avoid code duplication. It is an abbreviation of {X^v^ . B)@^ A. ^k.A and {B) 
are shift and reset operations [6]. Shift behaves like call/cc, but takes only the 
continuation delimited by reset. Shift and reset are used to grab continuation and 
to insert let-expressions, ffi and #2 are projection functions for a pair (A, B): 

ff,{A,B)=A, ff^{A,B) = B . 

Given a term and an environment p, the specializer S reduces static parts 
and residualizes dynamic parts as usual. When the term is a variable, its value 
in the environment is returned. When the term is a static abstraction, it simply 
returns a static lambda closure. 

When the term is a dynamic abstraction X^x. M, a let-expression is inserted 
to avoid code duplication/elimination. The rule is essentially the same as CPS- 
based specialization/let-insertion used in Similix[3], but is written in direct style 
using shift/reset operations. The rule can be read as follows: it first specializes 
the body M of the abstraction under the environment where the bound vari- 
able X is bound to a fresh dynamic variable x*. It is then given a name u* and 
residualized in a let-expression. The result of this dynamic abstraction is this 
new name, which is then passed to the rest of the specialization (continuation) k 
grabbed by the shift operation. Since the dynamic abstraction is residualized in 
a let-expression and not passed for further specialization, it will never be dupli- 
cated/eliminated regardless of how the dynamic name is used in the subsequent 
specialization. The specialization of the body of the abstraction and the applica- 
tion to the continuation are enclosed by reset operations so that let-expressions 
inserted within those specializations do not appear before the let-expression in- 
troduced here. This is necessary to maintain the scope of inserted let-expressions 
properly. 
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5 I®] p = p{x) 

S |A®a;. M] p = \x' . S |M] p[x' /x] 

S IX^x. MJ p = ^K. = X^x^ . (S [M] p[x^ /x\) in {nv^) 

S [A”®®. M] p = let^n* = A®x*. {T>{S [M] p\x^ /x])) in (k(u*, Xx' . S [MJ p[x' /x])) 
S |Mi@®M2] p = (5 [Ml] p)(5 [M2] p) 

5 [Mi@®M 2 ] p = iK. let®n^ = (5 [Mi] p)@®(5 [M 2 I p) in Uv^) 

S {cons^M^ M2] p = (5 [Ml] p, 5 [M2] p) 

5 [cons®Mi M 2 ] p = let®n* = cons^{S [Mi] p) (5 [M 2 ] p) in (ku*) 

5 [cons®Mi M 2 ] p = let {mi, m 2 ) = (5 [Mi| p, 5 [M 2 I p) 

in let®u* = cons^'D{mi)'D{mi) in (mi, m 2 ))) 

5[car®M]p = #i(5 [M] p) 

5 [car®M] p = ^k. let®n* = car^{S [M] p) in (kw*) 

5 [c®] p = c 
5 [c®] p = lift{c) 

5[c®]p = {lift{c),c) 

S [Ml M2] p = (5 [Ml] p) -t (5 [M2I p) 

5 [Ml M2] p = = (5 [Mi] p) (5 [M2] p) in { kv ^) 

5[M)g]p = #i(5 [M] p) 

5 [Mil] p = # 2(5 [M] p) 



D(M) 



#i(M) if Mis both 
M if M is dynamic 



Fig. 1. Specializer for annotated terms written in direct style 



The rule for X^x. M is similar to the rule for X^x. M, except that it returns 
both the dynamic variable and the static value instead of returning only a dy- 
namic variable. This enables us to use it in both static and dynamic ways. To 
use it as a static (dynamic) abstraction, we coerce it using Ig (if)), which is 
defined to take the appropriate component of the both values. Since X^x. M 
is also residualized in a let-expression, it will never be duplicated. Note that 
X^x. M requires overhead of carrying both static and dynamic values as a pair 
compared to a static or dynamic abstraction. 

T>{M) appearing in the rule for X^x. M is used to residualize the body M 
of X^x. M which can be either B or D.^ Since residualized expressions must be 
completely dynamic, T>{M) is used to take the dynamic part of M when M is 
both static and dynamic. The use of !?(•) does not imply that the binding-time 
of M needs to be checked at specialization time. Since it is available after the 
binding-time analysis, the rule for X^x. M can be split into two rules according 
to the binding-time of M {B or D). This is the same for cons ^ Mi M 2 where 
four cases (Mi and M 2 are either B or D) are merged into one rule using T>{-). 

Reconfirm here that X^x.M is always residualized in a let-expression. This 
incurs a restriction to our binding-time analysis: “The argument of an ab- 



4 



It cannot be S since it is residualized. 
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straction of type B has to be dynamic.” To understand its consequence, con- 
sider the following example: (A/, cons f {f@2))@{Xx. a; -I- 3). Since Xx. x -I- 3 is 
both statically applied and residualized, it is classified as both static and dy- 
namic. We expect to perform the addition 2 -|- 3 at specialization time to ob- 
tain cons {Xx. a: -|- 3) 5. Because of the restriction, however, x becomes dynamic 
(since Xx. a; -I- 3 is both) and so does the addition. Thus, the result we obtain 
is cons {Xx. a; -I- 3) (2 -I- 3). Because the value of x is unknown when Aa;. a; -I- 3 is 
residualized, x can only be dynamic. Readers might think that we could make 
X as both static and dynamic. This is not the case because x is either static or 
dynamic, not both static and dynamic. When x is residualized as an argument of 
Xx. a;-|-3, it is completely dynamic and has no static information. An expression 
can be both static and dynamic only when it has always both static and dynamic 
information at the same time. 

This restriction comes from our design choice that the binding-time analysis 
we consider is monovariant.® If we used a poly variant binding-time analysis, we 
could have assigned to x dynamic only when Xx. cc -I- 3 is residualized and static 
when Xx. a; -I- 3 is applied statically. However, this requires us to analyze the 
body of lambda expressions more than once. Rather than going into expensive 
polyvariant analyses, we stay in a monovariant framework to obtain efficiency. 
Further investigation is needed to see if the performance penalty of using poly- 
variant analyses for this particular case is actually serious. Note that even with 
this restriction, our binding-time analysis produces better results than the con- 
ventional monovariant binding-time analyses because lambda expressions of type 
B were previously treated as completely dynamic. 

Let us come back to Fig. 1. The rule for a static application is standard. It 
specializes the subterms in a compositional way and applies the result in the 
underlying A-calculus. A dynamic application is processed in a similar way but 
the application is residualized in a let-expression. The rules for cons, car, and a 
primitive application are similar. 

Finally, the rules for constants return a static constant, a dynamic constant, 
or both. To make a dynamic constant, a standard lift operator is used. Here, the 
type of constants is strictly distinguished to treat constants and data structures 
in a uniform way. This does not mean that we need to carry both static and 
dynamic constants for constants of type B. We can handle constants as the 
same way as the conventional partial evaluators by interpreting: 

Slc^]p = c S{cl%jp= lift{c) 

S {c^l p = lift{c) 5|cif]p = c 
5 |c^l p = c 

and changing the definition of !?(•) accordingly. 



® A binding-time analysis is called monovariant when each snbexpression to be an- 
alyzed is allowed to have only one binding-time. When multiple binding-times are 
allowed, it is called polyvariant. 
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To specialize a given annotated term M, we execute S with M and an empty 
environment in the global context: (5|M]p^). Here, ptf, is an environment 
which raises an error for all variables. 

The specializer shown in Fig. 1 avoids code duplication/elimination by in- 
serting let-expressions for residualization of dynamic expressions. This is impor- 
tant for preserving the identity (pointer equality) of a data structure. Although 
let-insertion has been used for this purpose in the conventional partial evalua- 
tors, data structures became completely dynamic once they were residualized in 
let-expressions. The use of both values enables us to use data structures freely 
without worrying about if they are residualized or not. 

Readers might think that the introduction of both values would break the 
identity of data structures since they are duplicated for a static version and 
a dynamic version. Fortunately, this is not the case because static computation 
and dynamic computation do not interfere. For static computation, a static value 
itself with its own identity is used. For dynamic computation, a unique name 
attached to a data structure serves as its identity. Because the static identity 
does not remain until the dynamic computation phase,® the identity of a data 
structure is preserved by regarding these static and dynamic identities as the 
identity of the data structure. 

4 Binding-Time Types 

The binding-time analysis presented in this paper is formalized as a type system. 
Binding-time values tt and binding-time types r are defined as follows: 

t: = S \ D \ B r = 7r|r r | r r | a | p,a. r . 

Binding-time values are either S (static), D (dynamic), or B (both static and 
dynamic). They are also used as binding-time types for constants, r r and 
T T are a function type and a pair type, respectively, whose binding-time 
value is tt. a is a type variable and pa. r is a recursive type. They are used to 
type terms in untyped languages such as Scheme.^ 

Among binding-time types defined above, only the types derivable from the 
following rules are well-formed. Here, Z\ h r : tt represents that a binding-time 
type T has a binding-time value tt under an assumption A. 

Z\, Of : 7T h r : 7T 

Z\|-7r:7r A, a \ tt \~ a \ tt Z\F pa. t : tt 



A h Ti : TTi 


Ah T2 : tt2 


r 7T o' 7Ti , 1 


A h Ti : 7Ti A h T2 : tt2 


f 7T 0 7Tl , 1 


A h Ti - 


T2 : TT 


( 7T 0 7T2 J 


Z\ h Ti X '' T2 : tt 


(7r0 7T2 J 



Rules for functions and pairs have additional requirements on their component 
types: tt\ \> tti and tt\ >' tt^ . They are similar to the requirements on function and 

® If it does, it is not static but dynamic. 

^ For example, / in A/. /@/ has a type pa. a p. 
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pair types found in the conventional type-based binding-time analyses, e.g., “if 
a pair is dynamic, its components have to be dynamic.” Here, they are extended 
to cope with both static and dynamic case: 

S>S S>D S>B D>D B>D B>B 

So'S St>'D So'B Dt>' D B>'D 

If 7Ti is static, no restriction on 7T2. If tti is dynamic, 7T2 has to be dynamic, 
too. These are standard requirements on well-formed types. If tti is both static 
and dynamic, there are two cases: the one which allows 7T2 to be both static and 
dynamic (c>) and the one which does not ([>'). The latter is used for the argument 
of abstractions of type B, since in that case, B is not permitted as described in 
the previous section. 

When h r : 7T is derivable, we say that a well-formed type r has a binding- 
time value 7T. We denote to separate a (toplevel) binding-time value tt from 
its type. We assume the following equivalence on types: 

D = D D = D D, [j,a.T = T[^a.T/a] . 



5 Typing Rules 

Figure 2 shows typing rules for annotated terms. A judgement B \- M : t repre- 
sents that M has a binding-time type r under a type assumption F. These rules 
are fairly standard and do not need particular explanation. We note only two 
points. First, the rules produce only well-formed types. (Notice that they do not 
produce a recursive type.) Second, constants and the result of static primitive 
applications are allowed to have arbitrary binding-time values. This reflects the 
fact that static constants can be coerced into dynamic constants using the lift 
operator. 

As an example of type checking, consider the term P presented in Sect. 1. If 
we correctly annotate P as follows: 

{\^x. cons^ {car^ X Ig) x [%)@{cons^ 12) 
then, its derivation becomes as follows where P represents x : D x^ D: 

P h X : D x^ D 

TFxjg: Dx^ D p ^ x : D x^ D 
F[~car^xlg:D P \- x if)-. D 

r F cons^ {car^ X l§) x l^: D h 1 : D F 2 : D 

h X^x. cons^ {car^ X Ig) X D x^ D — D h consul 2 : D x^ D 
F {X^ X. cons^ {car^ X Ig) X l^)@{cons^l2) : D 

The typing rules shown in Fig. 2 are an extension of the conventional binding- 
time analyses. By substituting all B with D, we And that the rules are standard 
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r,x:Ti\~M : T 2 



r,x:D\-M:D r,x: D\- M -.t tG{B,D} 

r\- X^x.M :ti T2 r\- X^x.M : D F X^ x. M : D r 

r h Ml : T2 n r h M2 : T2 r h Ml : D F \~ M2 : D 



F, X : T \- X : T 



F h Mi@^M2 : ri 



F h Ml : Ti F \- M2 '■ t '2 
F h cons^ Ml M2 : Ti x® T2 



F'r Ml-. D F^ M2: D 



F h cons Ml M2 : D 

r h Ml : n r h M2 : T2 Ti e {B, B} 

_r h cons^ Ml M2 : Ti x® T2 



_T h c : 7 T 

rh M : B 
Bh Mif: S 



B h Ml : g B h M2 : g 
B h Ml -h® M2 : 7T 

F\-M:D^^t t€{B,B»} 
B h Mif; D r 



F\-M:B F\-M:D^^t tG{B,D} 



Bh Mig: B 



Bh Mjg: B 



B h Mi@°M 2 : B 

F \- M : Ti X® T2 
B h car^ M : ri 

B h M : B 
B h cor^M : B 

B h Ml : B F\- M2: D 
F Ml +° M2 : D 

B h M : ri X® T 2 ri £ {B, B} 
BhMjfin x®T 2 

BhM:nx®r2 Ti£{B,B} 
Bh Mig: B 



Fig. 2. Typing rules for annotated terms 



typing rules for two-level terms. The only exception is the coercion rule from B 
to S, which was previously impossible. This is where we can perform more spe- 
cialization than before. By classifying previously dynamic expressions as both, 
more static computation becomes possible through extracting static parts of 
both values. 

An annotated term pann is well- annotated if h pann : D is derivable. It is 
the best annotation if the following two properties hold where the first one has 
higher priority:® 

1 • Bann classifies more subterms as S than any other well-annotated terms of p. 

2. Pann classifies less subterms as B than any other well-annotated terms of p. 

The first property is a standard one stating that Pann can perform static compu- 
tation more than any other well- annotated terms of p. It also excludes the case 
where a static value is unnecessarily raised to both and is used only statically 
by extracting its static part. The second property says that if the same static 
computation is possible (by the first property), then B should be avoided as 
much as possible to minimize the overhead of carrying both values. Even though 
the second property is not fulfilled, the obtained specialized program is the same 
if the first one is fulfilled. The second property assures that the overhead at spe- 
cialization time required for carrying both values is minimal. It also excludes the 



Here, the words ‘more’ and ‘less’ refer to set inclusion of subterms. 
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7T 7T2 , 
7T > 7Tl 



7T > 7Tl, 
7T > 7T2 



} 

} 



P h L\ : 7Ti h Z/2 : tv2 
P \- Li -\- L2 : TV 



7Ti D> 7T, 7T2 > 7T, 7Tll>7r2, 7T2 > 7Ti , 
7T1 e {5,D}, 7T2 G {5,D} 



} 



r \- L : T T = t' 
l-.t' 



Fig. 3. Typing rules for unannotated terms (constraint generation rules) 



case where a dynamic value is unnecessarily raised to both and is residualized 
by extracting its dynamic part. 

Given an unannotated term p, a binding-time analysis returns a well-anno- 
tated term pann that has the best annotation in the above sense. 

6 Constraint Generation 

Since the typing rules presented in the previous section are for annotated terms, 
they are not directly applicable to type reconstruction of unannotated terms. To 
reconstruct types for unannotated terms, those rules have to be rewritten so that 
they are directed over the syntax of unannotated terms. This rewriting is done 
by (1) combining rules for the same construct but different binding-time values 
and (2) incorporating coercion rules into the combined rules. The resulting rules 
are shown in Fig. 3. They are directed over the syntax of unannotated terms. 
The boxed parts indicate the places where coercion rules are incorporated. 

The rule for a constant is the same as before. The rule for a variable is merged 
with the coercion rule. Here, the constraint tt > tt' shows coercibility from tt to 
7 t' and is defined as follows: 



B> S, B> D 
TT > tt' (tt > tt') V (tt = tt') . 

Coercion is permitted only from B to S or D (not from S to D). If inequality 
TT > tt' holds when a binding-time analysis finishes, an annotation is attached 
to the term corresponding to tt' . 

The rule for abstraction is obtained by combining three abstraction rules in 
Fig. 2. Considering the equality D = D — D, it is easy to see that those three 
rules correspond to the cases tt = S, D, or B. The side condition on a function 
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type ensures its well-formedness. Observe that the rule for equality (the last rule 
of Fig. 3) is required to simulate the dynamic abstraction rule in Fig. 2. Coercion 
is not required here, because the rule for abstraction can produce the type after 
coercion. For example, if we made an abstraction of type B and immediately 
coerced it into S (as in {X^x. M) ), we could have created a static abstraction 
from the beginning instead (as in X^x. M). 

The rule for application is obtained similarly by combining two application 
rules in Fig. 2. The constraint tt G {S', D} shows that there exist only static and 
dynamic rules for application. The result of application may require coercion if 
Li returns a value of type B. 

The rules for cons and car are similar to the rules for abstraction and applica- 
tion. The equality D = D D is used to combine three rules for cons. Finally, 
the rule for primitive application says: “If any of the argument becomes dynamic, 
then the whole primitive application becomes dynamic. In that case, other ar- 
guments must also be dynamic.” This summarizes the two rules for primitive 
application in Fig. 2. 

Although the rules in Fig. 3 are directed over unannotated terms, types 
of each subterm are not obvious from these rules. To determine types of each 
subterm, a binding-time analysis first collects the type constraints generated 
by the rules in Fig. 3. Then, these constraints are solved to give types of each 
subterm. Once the constraints are solved and binding-time types are determined, 
it is easy to annotate a program because the rules in Fig. 3 correspond to the 
rules in Fig. 2. 

As an example of constraint generation, consider the term P in Sect. 1. Let 
F be a; : 7Ti X 7T2 . After unification of types, we obtain the following derivation: 

F h a; : 7Ti X '^3 7T2 

F h car X Ti'i F h a; : tti x ’^3 7T2 

F F cons {car x) x : x^^ (tti x^^ 3 h 1 : tti F 2 : 7 T 2 

F Ax. cons {car x) x : tti x~^^ tt 2 x'^^ (tti x’^3 7T2) F cons 1 2 : tti tt 2 

F (Ax. cons {car x) x)@{cons 1 2) : 7t{ x'^^ (tti x^^3 712 ) 
together with the following constraints: 

7r3t>7ri, 7T3 > TTg , 7Tl > 7r{, TTgOTTl, 7T4 > 7 t( , TTs >' 7T3 , 7T4 > TT^, 

7T3t>7r2, 7T3 > 7T3, TTg G (S', F}, TTg O 7T2, 7T4l>7r3, 7T5l>7r4, 7T5G{S',F}. 

The best solution to these constraints (with the initial constraint = D) is:® 

7T3 = 7T5 = S', 7T4 = TT^ = 7T2 = 7T^ = 7T4 = 7T4 = F, 7T3 = B 

which gives us the following expected result: 

(A'^x. cons^ {car^ X Ig) x [%)@{cons^ 12) . 

Notice that two coercion operators are inserted corresponding to the two in- 
equalities 7T3 > TTg and 7T3 > Pf whose equality did not hold. 



See the following sections for constraint solving algorithm and the initial constraint. 
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7 Interpreting Types as Pairs 

In the following sections, we describe constraint satisfaction algorithm. Rather 
than directly solving constraints obtained by the typing rules in Fig. 3, we first 
interpret a binding-time value tt as a pair (s, d). Constraints over tt are trans- 
formed into constraints over (s, d) accordingly, which are then solved by our 
constraint satisfaction algorithm. This is fairly a natural approach since B can 
be regarded as a set {5,1)}. 

The interpretation of the pair (s,d) is as follows. Assume that an expression 
L has a type tt = (s, d). s is a boolean value indicating if L is used statically. To 
be more concrete, s becomes true when (1) L is an abstraction and is applied 
statically, (2) T is a pair and is decomposed statically by car or cdr, or (3) L is 
a constant and is applied to a primitive statically, s = false indicates that L is 
not used statically, d is a boolean value indicating whether L is residualized or 
not. If it is, d becomes true. If not, false. 

The following table summarizes the correspondence between tt and (s,d). 
When s is true and d is false, it means that L is used but not residualized, 
i.e., static. When s is false and d is true, L is residualized but not used, i.e., 
dynamic, s = d = true means that L is both used and residualized, i.e., L is 
both, s = d = false means that L is neither used nor residualized. In this case, 
we can treat L in an arbitrary way. Here, we denote it as T. 





d = false 


d = true 


s = false 


T 


D 


s = true 


S 


B 



Under this interpretation of binding-time values, the best annotation is re- 
phrased as follows (where the first one has higher priority): 

1. Less d is true. 

2. Less s is true. 

The first criteria says that an expression should be 5 as much as possible, while 
the second one says that unnecessary B should be avoided. 

The constraints appearing in Fig. 3 can be transformed into constraints over 
(s, d) as follows. Assume that tt = (s,d), TTi = (sj,di), and is an implication 
having higher precedence than A. 

7Ti [> 7T2 : (di = true) ^ (d 2 = true) A (si = false) (s 2 = false) 

7Ti o' tt 2 : (di = true) ^ (d 2 = true) A (di = true) (s 2 = false) 

7Ti > 7T2 : (d 2 = true) (di = true) A (si = false) (s 2 = false) 

TT G {S, D} : {s = false) (d = true) A (d = true) => (s = false) 

By inspecting all possible variations for s and d, it can be verified that each 
constraint is equivalent (except for T) to the original one. Intuitively, tt\ 0 7T2 is 
interpreted as “if tt\ is residualized, so is 7T2”; tti o' 7T2 as “if tti is residualized. 

For example, 2 in the expression car [cons 1 2). 
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7T2 must be dynamic”; tti > tt 2 as “if 7T2 is used statically, so is tti, and if 7T2 is 
residualized, so is tti” (taking contraposition of the second constraint). 

Note that for all the constraints (A) ^ (B), A and B have always the form 
s = false or d = true. This observation is important in designing constraint 
satisfaction algorithm. 

8 Solving Constraints 

Given an unannotated term L, the binding-time analysis first generates type 
constraints as described in the previous sections. Assume that the type of L 
itself is given by tt = {s,d). Then, the constraints are solved together with the 
initial constraint d = true. It ensures that the result of partial evaluation becomes 
a completely dynamic expression. 

Constraint solving is performed in two steps corresponding to the two re- 
quirements for the best annotation: 

Step 1 Assign false to all d’s (except for d used in the initial constraint) and 
true to all s’s. Interpret a constraint (A) (B) as “if A holds, then change 

the value of appropriate s or d so that B holds.” Repeat this operation until 
no changes are required. 

Step 2 Assign false to all s’s. For d, use the value of d at the end of step 1. 
Interpret all the constraints as their contraposition. That is, (A) (B) is 

interpreted as “if B does not hold, then change the value of appropriate s 
or d so that A does not hold.” Repeat this operation until no changes are 
required. 

Step 1 is mainly concerned with d. It starts from the assumption that every 
expression is static (s = true and d = false). Then, expressions that have to be 
residualized (d = true) are identified. Because the fixed-point iteration starts 
from d = false and d becomes true only when necessary, the result is the least 
fixed-point in the sense that the smallest number of d is true. 

Step 1 considers s, too, because the value of s sometimes affects the value of 
d. This happens when the constraint of the form tt G {5, D} appears (such as in 
the rule for application). When this constraint appears, tt should be D only if it 
is impossible to assign S (since we want to make as many expressions static as 
possible). To achieve this, we first set s = true. Only when s needs to be false^^ 
do we assign d = true. 

Notice that we have two contradictory requirements here. We want to assign 
S as much as possible. At the same time, if an expression is used only as dynamic, 
we do not want to make it both unnecessarily (because it incurs overhead) . The 
former requirement says that s should be true as much as possible, whereas 
the latter says s should be false as much as possible. This is the reason why 
our algorithm has two steps. At step 1, the former requirement (which is more 
important) is fulfilled. The latter is satisfied at step 2 to the extent that the 
former is not violated. 



11 



For example, an argument of an abstraction of type B cannot be static. 
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Step 2 is concerned with It identifies expressions that are actually used 
as both static and dynamic among the expressions that are classified as i? at step 

1. To do this, all expressions that are classified as B (or D) are first assumed to 
be dynamic (i.e., s = false). Then, expressions that are actually used statically 
are changed back to B {s = true). Because the fixed-point iteration starts from 
s = false and s becomes true only when necessary, the result is the least fixed- 
point in the sense that the smallest number of s is true. 

9 Correctness 

This section describes the correctness of our algorithm. We show that the 
fixed-point iteration changes the solution monotonically (in a certain lattice of 
finite height), that a solution exists, and hence that the algorithm terminates 
giving the least solution satisfying constraints. In the following, we assume that 
true > false. 

Step 1 begins with s = true and d = false. 

Lemma 1. Application of each constraint in step 1 does not decrease the so- 

def 

lution in the following lattice: (si,<ii) > ( 52 ,^ 2 ) (si < S 2 ) A {di > ^ 2 )- 

Furthermore, once a constraint is satisfied, it will never he violated in step 1. 

Since the solution where s = false and d — true (for all s’s and d’s) trivially 
satisfies all the constraints, there exists a solution. Hence, step 1 returns a least 
solution in which the smallest number of d’s are true. 

Step 2 begins with s = true and d with the value at the end of step 1. We 
first show that the value of d is the same throughout step 2. 

Lemma 2. The value of d is not modified during step 2. 

The above lemma says that we can forget about d and concentrate on s in step 

2. This coincides with our intuition that step 2 changes unnecessary B to D}^ 
In the next lemma, we see that step 2 has a similar property to step 1. Note 
that inequality on s in the definition of lattice is different from Lemma 1. 

Lemma 3. Application of each constraint in step 2 does not decrease the so- 

def 

lution in the following lattice: (si,di) > ( 52 ,^ 2 ) (si > S 2 ) A {di = ^ 2 )- 

Furthermore, once a constraint is satisfied, it will never he violated in step 2. 

Because the solution obtained at the end of step 1 satisfies all the constraints, 
there exists a solution. Hence, step 2 returns a least solution in which the smallest 
number of s’s are true among the solutions in which the smallest number of d’s 
are true. 

In the next section, we will see that the value of d does not change during step 2. 
Due to the lack of space, proofs for lemmas are not shown here, but will appear in 
the forthcoming technical report. 

It also changes unnecessary S' to T. 
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10 Complexity 

This section shows that the time complexity of solving constraints is almost lin- 
ear to the length of a program. The number of constraints is linear to the length 
of a program because the number of constraints generated for each construct 
is constant. During type inference, we need to unify types as in the conven- 
tional type inference. The unification can be efficiently implemented using the 
union/find algorithm[18] which takes 0{Na{N,N)) time, where N is the length 
of a program and a is an inverse of Ackermann’s function. Since a{N ^ N) is 
known to be less than four for all practical programs[18], 0{Na{N, N)) can be 
regarded as almost linear time. 

Step 1 starts from the initial constraint d = true and then proceeds by prop- 
agating d = true or s = false. Since constraints remain satisfied once they are 
satisfied, each constraint is processed only once. Thus, step 1 finishes in linear to 
the number of constraints, and hence to the length of a program. Step 2 similarly 
finishes in linear time by propagating s = true at most once for each constraint. 



11 Related Work 

Hornof, Consel, and Noye[13] presented a binding-time analysis for C which 
handles both static and dynamic data structures. They performed forward and 
backward analyses (based on data-flow equations) to identify use and definition 
of data structures. Roughly speaking, use and definition correspond to s and d, 
and forward and backward analyses correspond to the two steps of our algorithm. 
Although developed independently, our work can be seen as an extension of their 
work to functional languages. We reformulated it in a type-based analysis and 
included higher-order functions. 

Almost linear time algorithm for solving constraints was first presented by 
Henglein[12], and later refined by Bondorf and J0rgensen[4]. It was also applied 
to multi-level specialization by Gliick and J0rgensen[ll]. We basically followed 
the same technique of using the union/flnd algorithm[18] to obtain almost linear 
time algorithm. Because of the introduction of both static and dynamic expres- 
sions, our algorithm is split into two steps. 

Sperber[16] presented an online partial evaluator that handles multi-valued 
binding-times: static, dynamic, and unknown. Although his work is similar to 
our work in that both treat binding-times other than static and dynamic, they 
are rather independent of each other. Sperber introduces the unknown binding- 
time when a static expression and a dynamic expression are merged (through, 
for example, conditional expressions). A value with the unknown binding-time is 
checked its binding-time at specialization time online. Such expression becomes 
dynamic in our framework. On the other hand, both static and dynamic expres- 
sion presented here is an expression that is known to be used both as static and 
dynamic. Such expressions become dynamic in Sperber’s framework. 

This work is continuation of our previous work where we showed that prop- 
agation of both static and dynamic values is useful for specializing programs 
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containing tests for pointer equality [1]. Because it was based on an online ap- 
proach, however, the resulting specializer did not run efficiently compared to 
existing offline specializers. This paper showed that the same technique can be 
applied to offline settings without spoiling the advantage of the offline frame- 
work. 

Type-directed partial evaluation (TDPE)[5] is an efficient partial evaluation 
technique, where specialization is directed by the type of expressions. By sys- 
tematically applying ?7-expansion[7] using types, it reifies (lifts) values to their 
program texts. From the TDPE perspective, our work is close to performing 
one-level of 77-expansion. However, naive application of one-level 77-expansion 
leads to code duplication. Since simple let-insertion will make residualized data 
completely dynamic, similar mechanism to ours will be necessary to use TDPE 
as a lifting tool of data structures. Also, it seems difficult to bound the time 
complexity of TDPE-based lifting. On the other hand, it will be an interesting 
research topic to see if the type presented here can be used to direct partial 
evaluation in a TDPE manner. 

12 Conclusion 

This paper presented a binding-time analysis for a functional language where 
expressions are allowed to be used as both static and dynamic, together with a 
specializer for it. With both static and dynamic expressions, we can statically 
access data structures while residualizing them at the same time. Previously, 
such data structures were treated as completely dynamic, which prevented us 
from accessing their components statically. The technique presented in this pa- 
per effectively allows us to lift data structures which was prohibited in the con- 
ventional partial evaluators. The binding-time analysis is formalized as a type 
system and the solution is obtained by solving constraints generated by the type 
system. We have proven the correctness of the constraint solving algorithm and 
shown that the algorithm runs efficiently in almost linear time. Based on this 
framework, we are currently making a partial evaluator for Scheme. As a future 
work, we are planning to extend this framework to multi-level specialization)!!] 
where multiple binding-times are allowed. 

Acknowledgements. I would like to thank Hidehiko Masuhara and anonymous 
referees for many valuable comments. 
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Abstract. We demonstrate that abstract interpretation is useful for 
analysing calculi of computation such as the ambient calculus (which 
is based on the 7r-calculus); more importantly, we show that the entire 
development can be expressed in a constraint-based formalism that is 
becoming exceedingly popular for the analysis of functional and object- 
oriented languages. 

The first step of the development is an analysis for counting occurrences 
of processes inside other processes (for which we show semantic correc- 
tness and that solutions constitute a Moore family); the second step is a 
previously developed control flow analysis that we show how to induce 
from the counting analysis (and its properties are derived from those of 
the counting analysis using general results). 



1 Introduction 

The ambient calculus is a calculus of computation that allows active processes 
to move between sites; it thereby extends the notion of mobility found in Java 
(e.g. [6]) where only passive code may move between sites. The untyped calculus 
was introduced in [3] and a type system for a polyadic variant was presented in 
[4]. The calculus is based on traditional process algebras (such as the 7r-calculus) 
but rather than focusing on communication (of values, channels, or processes) it 
focuses on the movement of processes between different sites; the sites correspond 
to administrative domains and are modelled using a notion of ambients. We refer 
to Sect. 2 for a review of the ambient calculus. 

Abstract interpretation is a powerful technique for analysing programs by step- 
wise development. One starts with an overly precise and costly analysis and then 
develops more approximate and less costly analyses by carefully choosing appro- 
priate Galois connections; in this way the semantic correctness of the initial 
analysis carries over to the approximate analyses. This technique has demon- 
strated its ability to deal successfully with logic languages, imperative languages 
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and functional languages. Recent papers have studied how to apply the techni- 
que to calculi of computation such as the rr-calculus but have found a need for 
assuming that processes were on a somewhat simplified form [8,9]. 

We show that abstract interpretation can be developed for the ambient calculus 
(that contains most of the 7r-calculus as well as a number of other constructs) 
without the need to assume that processes are on a simplified form. More impor- 
tantly, we are able to perform the entire development by expressing the analyses 
in a constraint-based formulation that closely corresponds to formulations that 
have become popular for the analysis of functional and object-oriented langu- 
ages. This is likely to make abstract interpretation more accessible to a wider 
community because often abstract interpretation is being criticised for starting 
with a “low level” trace-based semantics. 

The first step is the development of an analysis for explicitly modelling which 
processes can be inside what other processes; in order to model accurately what 
happens when the only process inside some other process actually moves out of 
the process, the analysis incorporates a counting component. As is customary 
for constraint-based formulations (in particular our own based on flow logics) 
this takes the form of specifying a satisfaction relation C \= P for when the set C 
of descriptions of processes satisfies the demands of the program P; here C is a 
set of tuples that each describe a set of processes. This approach is very natural 
for applications such as security and validation where information obtained by 
other means needs to be checked before it can be used - much the same ideas 
are found in type systems. We then show that the specification is semantically 
sound (meaning that, for a suitable extraction function ry, if C \= P and rj{P) G C 
then C contains descriptions r]{Q) of all processes Q that P can evaluate to) and 
that the set of acceptable solutions has a least element (or more precisely that 
{C \ C \= P} constitutes a Moore family). The details are covered in Sect. 3. 

The second step is to show that a previously developed control flow analysis [7] 
(in the manner of [1,2]) can in fact be induced from the counting analysis; to 
show this we first clarify what it means to induce one constraint-based analy- 
sis from another. We then show that semantic correctness and the existence of 
least solutions carry over from the counting analysis. This shows that abstract 
interpretation is a useful guide also when developing analyses of calculi of com- 
putation. It follows that the theoretical properties established in [7] from first 
principles actually fall out of the general development. We refer to Sect. 4 for 
the details. 



2 The Ambient Calculus 



Ambients are introduced in [3] to provide named places with boundaries within 
which computation can happen. Ambients can be arbitrarily nested and may 
interact through the use of capabilities. 
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Syntax. As in [3] the syntax of processes P,Q £ Proc is given by: 

P,Q ::= {vn)P restriction 

I 0 inactivity 

I P \ Q composition 

I \P replication 

I [P] ambient 

I in^ n^.P capability to enter 

I out^ nf^.P capability to exit 

I open^ n^.P capability to open 

n names 

The restriction (Vfj^n)P introduces the new name n and limits its scope to P; 0 
does nothing; P | Q is P and Q running in parallel; replication provides recursion 
and iteration as !P represents any number of copies of P in parallel. By [P] 
we denote the ambient named n with the process P running inside it. The 
capabilities in^ and out^ are used to move ambients whereas open^ is 
used to dissolve the boundary of an ambient; this will be made precise when we 
define the semantics below. To allow the analyses to deal with the a-equivalence 
inherent to the calculus, we have added markers p £ Mar to the uses of names. 
For a process to be well-formed, all occurrences of a name in a given scope must 
have the same marker. As is customary for the flow logics approach to control 
flow analysis [1,2] we have also placed labels £ Lab“ on ambients and labels 
£ Lab* on transitions - this is merely a convenient way of indicating “program 
points” and is useful when developing the analyses. The sets of names, markers 
and labels are left unspecified but are assumed to be non-empty and disjoint. 
We write /n(P) for the free names of P. 

Semantics. The semantics is given by a structural congruence P = Q and a 
reduction relation P — >■ Q in the manner of the 7r-calculus. 

The congruence relation of Fig. 1 is a straightforward modification of the cor- 
responding table of [3]. Furthermore, processes are identified up to renaming of 
bound names: (t'^n)P = {v^m){P {n £- m}) if m ^ fn{P). Well-formedness is 
preserved under the congruence and the renaming of bound names. 

The reduction relation is given in Fig. 2 and is as in [3]. A pictorial representation 
of the three basic rules is given in Fig. 3. Well-formedness is clearly preserved 
under reduction. 

Example 1. Consider the ambient w that contains a probe p: 
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P = P 

P = Q ^ Q = P 
P = Q A Q = R ^ P = R 

P = Q ^ {Vf_,n)P = {v^n)Q 
P = Q ^ P \ R = Q\ R 
P = Q ^ \P=\Q 
P = Q ^ nf[P] snf [Q] 
P\0 = P 
{u^n)0 = 0 
!0 = 0 



P\Q = Q\P 

(P \ Q) \ R = P \ (Q \ R) 

\P = P\\P 

P = Q ^ in^ rif_i.P = in^ n^.Q 
P = Q ^ out^ Hf^.P = out^ Ufi.Q 
P = Q ^ opei/ Ufj,.P = open^ n^.Q 
(v^n){v^,,m)P = {v,j,im)(v^n)P 
{v^n){P \ Q) = P \ {v^n)Q 
if n ^ fn{P) 

M{m‘;iP])^mi:[{u,n)P] 
if n ^ m 



Fig. 1. Structural congruence 



nl [in m^f.P \ Q] \ mf,/ [i?] mf./ [nf. [P | Q] | P] 
m^/ [n^ [oMt^ mi,i.P \ Q] \ R] ^ [P \ Q] \ rn^, [P] 

open n^.P \n^[Q]^ P \ Q 

P ^ Q P ^ Q 

nf [P] nf [Q] {vp.n)P {v^,n)Q 

P^Q P = P' P' ^Q' Q' = Q 

P \ R^ Q \ R P ^ Q 

Fig. 2. Reduction relation 



The ambient can use the probe to fetch ambients with name k that are willing 
to be fetched; as an example we have: 

[open^^p^/ I Q] 

For convenience we will denote the composition of ambients w and fc by P: 
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Fig. 3. Pictorial representation of the basic reduction rules 



We illustrate the fact that w can use p to fetch k by the reduction sequence: 

I P] I | Q] 

^ I | [open^"^p^> \ Q] 

wf [P] I kf^„ [in^'^Wfj] I open^*p^,, \ Q] 

The reduction sequence shows that w sends p into k; k opens p and enters w. □ 



We usually consider processes of the form P* executing in an environment re- 
presented by the ambient (with label £J). This amounts to systems of the 

£a 

form n,4[P*], such that neither /i* nor occur inside P*. 



3 Occurrence Counting Analysis 

In this section we present an analysis that counts occurrences of ambients. In the 
following, an ambient will be identified by its label £“ G Lab“ and a capability 
by its label G Lab*. 

The analysis works with powersets of representations of processes. A process can 
be described by a triple (/, H, A) G InAmb x HNam X Accum; the individual 
components of the triple are described below. 

For each ambient the set of ambients and capabilities contained in it is recorded 
in the following component: 

i G InAmb = P(Lab“ x (Lab“ U Lab*)) 
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If a process contains an ambient labelled enclosing a capability or ambient 
labelled then (£“,£“*) G / should hold in order for {I,H,A) to be a correct 
representation of the process. 

Each occurrence of an ambient has a marker and to keep track of this information 
we have the component: 



H G HNam = P(Lab“ x Mar) 

If a process contains an ambient labelled with marker /x then G H 

should hold in order for (/, H, A) to be a correct representation of the process. 

Furthermore the representation contains information about the number of oc- 
currences of each ambient called the multiplicity of the ambient; this information 
is recorded in the A-component: 

A G Accum = Lab“ — (Mult \ {0}) 

where Mult = {0, l,w} (w should be read as two or more, and 1 as exactly one) 
with an “addition” operator © : 



© 


0 1 a; 


0 


0 1 w 


1 


1 UJ UJ 


to 


Ul UJ to 



If a process contains an ambient labelled then A(£“) should be 1 or w (depen- 
ding on the actual number of occurrences of £“) in order for (/, H, A) to be an 
acceptable representation of the process. 

We say that (/, H, A) is compatible if whenever (£“,£“*) G / then ({^“*}nLab“) C 
dom(A) and whenever (^“,/x) G H then G dom(A). 

A proposed analysis C is a powerset of representations of processes: 

C G CountSet = P(Count) 

where Count = {(/, A) G InAmb x HNam x Accumj (/, H, A) is compatible}. 

Representation function. The representation function for a process is defi- 
ned in terms of an extraction function 77®^: 

The definition of 77®^ is given in Fig. 4; it uses the operator ttl in order to combine 
representations of processes (note that ttl produces a compatible triple from two 
compatible triples): 

Ai) ttl {I2, H2, A2) = (ii U l2, Hi U H2, Ai © A2) 




140 R. Rydhof Hansen et al. 



Here © is extended to Accum x Accum as follows: 



(il ©i2)(£) 



Ai{€) © A 2 {t) if ^ G dom(Hi) 

Ai{i) if ^ G dom(Hi) 

A 2 {() if ^ ^ dom(Hi) 

undef if ^ ^ dom(Hi) 



A £ € dom(H2) 
A £ ^ dom(H2) 
A £ G dom(H2) 
A £ ^ dom(H2) 



The notation [S' i— >• m] is used for the finitary function that is only defined on 
S and that gives the constant value m for all arguments in S. Notice that all 
ambients “inside” a replication are assigned the multiplicity w as is natural due 
to the congruence axiom !P = P | IP that ensures that IP = P \ ■ ■ ■ | P | IP. 
It is clear that the rj^^{P) is compatible. 



Analysis specification. The specification of the analysis is given in Fig. 5 and is 
explained below. The specification is compositional and thereby well-defined. 

The clause for in n^.P first checks the subprocess P. Then all representations of 
processes in C are considered. Whenever the capability occurs inside an ambient 
with a sibling £°‘ with the marker /x, a demand on C is made depending on the 
multiplicity of £°' . If £°' has the multiplicity 1 it is required that the representation 
where £°' no longer is within its parent £°‘ but has moved inside its sibling £°' 
is recorded in C. Since we do not know whether or not a capability labelled 
occurs inside an ambient somewhere else in the process we have to make two 
demands, one where {£°',£*') has been removed from the representation and one 
where it has not. If has the multiplicity ui we cannot make any assumption on 
the number of ambients labelled £“ that occur inside the ambient £°“ . Therefore 
we make two demands on C, one that represents the case where two or more 
ambients labelled £°“ occur inside £°“ and one that represents the case where £°‘ 
occurs exactly once inside £°' . As for multiplicity 1 we have a demand where 
£* still occurs in the process and one where it does not. The clause for out- 
capabilities is similar. 

The clause for open^ n^.P first checks the subprocess P. Then all representations 
of processes in C are considered. Whenever the capability occurs with a sibling 
with the marker /i, a demand on C is made depending on the multiplicity of 
£°“ . If £°' has the multiplicity 1 it is required that the representation where £°' 
has been removed from the process (using \ to reduce the domain of A) and the 
ambients and capabilities within it have moved inside the parent is recorded 
in C. Again we have to make the same demand where £* no longer occurs inside 
If £“ has the multiplicity u, a rather involved demand on C is made. Since we 
do not know which ambients and capabilities reported to be inside an ambient 
labelled £“ are inside the ambient that is actually opened, we have to consider 
all subsets Z of the ambients and capabilities reported to be inside an ambient 
. Furthermore, we have to consider all subsets Y of Z, which represents the 
ambients and capabilities that only occur inside the ambient being opened. The 
ambients and capabilities in Z move inside the parent ambient £°' and the ones 
from Y are removed from the representation. As for the other cases considered 
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=V?°iP) 

V?^{0) =( 0 , 0 ,±) 

V?°{P\Q) =V?^{P)ihv?^{Q) 

r;OC(!P) = let(/,p,i)=^oC(p) 

in {1,H, [dom(A) !->• oj]) 

v?°i<[P]) = %°a«(P) H ({(£,D},{(r,M)}, [{r} ^ 1]) 

= 77 °° (P)ta ({(€,£*)}, 0, A) 

V?^{out^\^.P) = V?^{P) H {{{£,f)},0,±) 

rff'^{open‘-\f,.P) = rff°{P) ttl ({(€,£*)}, 0, A) 

Fig. 4. Extraction function for occurrence counting 

we have to include a representation where t' occurs in and one where it does 
not. Furthermore there is some bookkeeping (represented by U and m) to ensure 
that the H and A components are correct in special cases. 

3.1 Properties of the Analysis 

The following proposition establishes the semantic soundness of the occurrence 
counting analysis through a subject reduction result: 

Proposition 1. Let P,Q G Proc, G Lab“ then 

f3?^{P) CC A C P A P^Q => !3fa^{Q) CC A C Q 

We next show that the set of acceptable solutions constitutes a Moore family. 
Recall that a subset of a complete lattice, Y C L, is a Moore family if whenever 
r' c y then nr' g y. 

Proposition 2. The set |c I C p ^ ^ ® Moore family for 

every P and 

The Moore family property implies that the counting analysis admits an analysis 
of every process, and that every process has a least or best analysis. 

The set C may be infinite because it is a subset of the infinite set Count. However, 
it is possible to restrict the set Count to be finite by restricting all ambient labels 
and transition labels to be those occurring in the program P* of interest; this 
defines the finite sets Count* and CountSet* of which C is a member. It follows 
that the least solution 

C* = n {c G CountSet* C [=°^ P* A P?^{PA E C } 

is in fact computable. However, it is likely to require exponential time to compute 
C* and this motivates defining a yet coarser analysis. 
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n)P 


iffC 1=°'^ 


P 




0 




always 






p\ 


Q 


iffC 1=°'^ 


P A C 




!P 




iffC 1=°'^ 


P 






[P] 


iffC 1=°'^ 


P 




irA 




iffC 1=°'^ 


P A 



y{i,H,A) eC:vr,r',r" : 

/ e / A (r",r) e / a \ 

y(r",r') € 7 A (7“',p) e 77 J ^ 

/ case A{i°‘) of ^ 

l:(7\{(7“",7“)}u{(7“',7“)},77,i) eC A 
(7 \ { (7“" , 7“) , (7“ , 7‘) } U { {t , 7“) }, H, i) € C 
o.:(7 U 77, i) €C 

(7 \ { (r" , £“)} u {(r' , r )}, 77, i) € c 
(7\{(r,£*)}u{(r',r)},77,H) ec 

V (7 \ { (r" , 7“) , (r , £‘) } U { (£“' , r ) }, H, i) e c / 

C iffC|=°'=PA 

V(7,77,i) €C :V7“,r',r" : 

/ (r,7‘) e 7 A (r',r) e 7 a \ 
y(r",r') e 7 A (r',p) e 77 J ^ 

/ case H(7“) of ^ 

1 :(7 \ { (r , r )} u {(r" , r )}, 77 , i) e c 
(7 \ {(r' , r ), (r, £‘)l u {(£“", r)}, h,a)&c 
a.:(7u{(r",r)},77,i) €C 
(7 \ {(r , £*)} u {(r" , r)}, H,A)ec 
(7 \ { (r , r )} u {(r" , r )}, 77 , i) e c 

V (7 \ {(r' , r ), (r, £‘)l u {(r", r)}, H,A)&cj 

C 1=°'^ open^'n^.P iff C |=°'= P A 

y(l,H,A) eC:'ir,r' 

((r,7‘) e 7 A (r,r') e 7 a (r',p) e p) ^ 

/ case H(7“ ) of \ 

1:VX C {(£“,£*)} : 

(7 \ ({(£“',£') G 7|P e Lab} U {(£“,£“')} U X) U {(r,7')l (£“',£') £ 7}, 

^ \{(r', 1^)1 m£ Mar}, i\{r'})eC 
o>:VX C {(£“',£') G 7|P G Lab} : VH C Z : VX C {(£“,£*)} : 

VP c {(r',^i)} s.t. U / {(r',^') G 77|^' G Mar} : Vm G {l,w} : 

V {{i\{YuX))u{{e,£')\ie' ,£') € Z},H\U,A[r' ^m]) €C / 



Fig. 5. Specification of analysis with occurrence counting 
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(res) 


P?^((vn)P) 


= P?^(P) 


(zero) 


P?^(Q) 


= (0,0) 


(par) 


!i?^(P 1 Q) 


= P?^(P) U p^(Q) 


(repl) 


(}‘r^(\P) 


= P?^(P) 


(amb) 




= /3f/(F)u({(t,r)},{(r,M)}) 


(in) 


(in^' n^.P) 


= pf^(P)u({(l,t)},%) 


(out) 


(out^ n^.P) 


= pf^(P)u({(l,t)},%) 


(open) 


Pi^^(open^ n^.P) 


= pf^(P)u({(l,t)},%) 




Fig. 6. Representation Function for CFA 



(res) 


(i,H) (,zn)P 


iff (/, H) 1=*^'" P 


(zero) 


(i,H) 


always 


(par) 


(i,H) 1 g 


iff (/, H) p A (/, H) g 


(repl) 


(i,H) \P 


iff (/, H) 1=*^'" P 


(amb) 


(i,H) <“[P] 


iff (/, H) P 


(in) 


(i,H) p 


iff (/, H) 1=*^'" P A 




vr,r',r" g Lab“ 


: ((r,f) G / A (r",r) g / a (r", 




A (t',p)€H) 


=;> (r',r) G / 


(out) 


(i,H) 1=*^'" out^*ni^A 


’ iff (/, H) 1=*^'" P A 




vr,r',r" g Lab“ 


: ((r,t) e 7 A (r',r) g / a (r". 




A (r',ii)€H) 


=;> (r",r) G / 


(open) 


(i,H) 1=*^^ open^ n^. 


P iff (/, H) 1=*^'" P A 




vr,r G Lab“ : ((t,t) €i A (r,r ) g / a (r ,/r) g 




=;> |(r,t') 1 (t 


m 

in 



Fig. 7. Specification of CFA 



4 Control Flow Analysis 

The control flow analysis can be obtained from the counting analysis by dispen- 
sing with the A component and by merging the resulting pairs (I, H) into one 
(by taking their least upper bound). In other words, the analysis works on pairs 

(/, i4) G InAmb x HNam. 

In keeping with the counting analysis, the control flow analysis is defined by 
a representation function and a specification. The representation function is 
shown in Fig. 6 and the analysis specification in Fig. 7. The specification of the 
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analysis for the cases (res), (zero), (par), (repl) and (amb) are merely recursive 
checks of subprocesses. The case (in) states that if some ambient, labelled has 
an m-capability (denoted by (£“,£*) G I) and has a sibling (denoted (£°- ,1°-) G 
I t\(l°^ , £“ ) G /) with the right name (denoted (£“ , /r) G ti) then the possibility 
of that ambient using the in-capability should also be recorded in / (denoted 
(£“ ,£°') G /). The cases (out) and (open) are similar. 



Example 2. Recall the process, P, of Example 1 and let P and Q equal 0 for 
simplicity. For G Lab“, the least analysis to (P) E (I,H) A (I,H) P 
is given by: 

/ = { (C , r" ), (^^ r' ),(£“, r) , (r , r" ),(r, r ' ) , (r' ,£\), (r ' , 4 ) , (r' , 4) , 
(r" , r" ) , (r" , r' ) , (r" ,£{), (r" , 4) , (r" , 4) , (r" , 4) } 

4 = {(r", 4 '),(^“', 4 ),(^“,A^)} 



where all counting information of the reduction sequence has been discarded and 
the remaining information has been merged. □ 



The control flow analysis was developed in [7] where the formal properties of 
the analysis were also established and where the analysis was used to validate 
the protectiveness of a proposed firewall. Below we first develop a notion of how 
to induce constraint-based analyses and then show that the analysis can also be 
induced from the counting analysis, whence the formal properties of the control 
flow analysis also follow from the general framework presented here. 



4.1 Systematic Construction of Analyses 

The framework presented in this paper lends itself to a systematic approach 
analogous to the use of Galois connections in abstract interpretation. In what 
follows we assume that E.4 is an ordering of the complete lattice A and that E.4' 
is an ordering of the complete lattice A'] we shall write E for both orderings 
where no confusion can arise. Here a Galois connection between A and A' , 

,7 

denoted A' ' ^ > A , is a pair of monotone functions, a : A' ^ A and 7 : A — ^ 
A' , such that id_4' E 7 ° o and a o j O id^ . We are now in a position to define 
the key notion of an induced satisfaction relation: 



7 

Definition 1 (Induced Satisfaction Relation). Let A' ' ^ y A be a Galois 
connection. A satisfaction relation 4^ Proc x A — >■ {tt,ff} is said to be induced 
from another satisfaction relation 4^- Pfoc x A' — >■ {tt,ff}, when the following 
holds: 

VA G A, P G Proc : A 4 P 4=^ 7(A) 4' P 
Analogously, representation functions may be induced: 
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7 

Definition 2 (Induced Representation Function). Let A' ' ^ > A he a 

Galois eonnection, then a representation function, j3 : Proc A is said to he 
induced from a representation function, j3' : Proc — > A' whenever: 

(3 = ao (3' 

An induced satisfaction relation inherits several important formal properties of 
the original satisfaction relation, such as subject reduction and Moore family 
properties. We first show that subject reduction is preserved: 

Proposition 3. Let Proc x A — >■ {tt, ff}, (3 : Proc — > A and Proc x A' — >■ 

{tt.fF}, (3' : Proc — ^ A' he given such that ^ and (3 are induced from \=' and (3' 

1 

respectively, via the Galois connection A' ' ^ > A . Then 

(3'{P) G A' A A' P A P ^ Q ^ !3'{Q) C A' A A' \=' Q 



implies 



(3{P) GA A A\=P A P^Q ^ (3{Q) GA A A\=Q 



Proof. We calculate as follows: 

f3{P) Q A A A^ P A P ^ Q 

!3'{P) E 7(A) A 7(A) P A P^Q 
^ P'{Q) E l{A) A 7(A) K Q 
!3{Q) E A A A\= Q 



This concludes the proof. 



□ 



Next we show that the Moore family property is also preserved: 

Proposition 4. Let Proc x A — >■ {tt,ff}, (3 : Proc -4 A and Proc x A' -A 

{tt, ff}, (3' : Proc — >■ A! he given such that ^ and (3 are induced from \=' and (3' 

1 

respectively, via the Galois connection A' ' ^ > A . Lf 

{A' gA'\A' \=' P A (3'{P) E A' } 
is a Moore family for every P, then 

{AgA\A\=P a (3{P) E A} 
is also a Moore family for every P. 
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CountSet *• > InAmb x HNam 

tC aCF 

Proc 

Fig. 8. Galois connection 



Proof. Let P G Proc and Ai G A for alH G I where I is an index set. We then 
calculate as follows: 

yiGl:{A^P A P{P) C A,) 

^yiGl: {A{Ai) P A P'{P) C ^{Ai)) 

^ n7(Gl,) P A (3'{P) E n7(Gl,) 

^ 7(nGl,) P A (3'{P) E 7(nGl,) 

=j> r\Ai \= P A f){P) E nAi 

This concludes the proof. □ 

4.2 Properties of the Analysis 

In order to show that the control flow analysis is induced from the counting 
analysis we need a Galois connection. In the following E is the coordinatwise 
ordering on InAmb x HNam. We define 

a^^{C) = \_\{{i,H)\{i,H,A)GC} 

= {{P,H',A')\{P,H')Q{i,H) A (/', iL', i') is compatible} 

and note that CountSet ^ InAmb x HNam is a Galois connection (see 

Fig. 8). The following proposition then states that the control flow analysis is 
induced, cf. Definition 1, from the analysis with occurrence counting: 

Proposition 5. Let P G Proc and (. G Lab“ then 

(i,H) ^ ^^^P 

and 

From the Propositions 1, 3 and 5 and the semantic correctness of the control 
flow analysis immediately follows: 
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Corollary 1. Let P,Q £ Proc and G Lab“ then 

P?{{P) E {I,H) A P A P^Q ^ 

(Q) E {i,H) A (i,H) 



Furthermore, the Moore family property of the control flow analysis follows from 
Propositions 2, 5 and 4: 



Corollary 2. The set I (I,H) h 

family for every P and . 



CF 



P A E (/, H)^ is a Moore 



In [5] it is shown that by restricting the attention to a given system, [P*] , of 
size s, it is possible to devise an O(s^) algorithm for computing the least solution 
to the control flow analysis. 



5 Conclusion 

We have shown that abstract interpretation can be formulated in a constraint- 
based manner that makes it useful for analysing calculi of computation such 
as the ambient calculus and without the need to assume processes to be on 
a simplified form. The development mimics the familiar developments of using 
abstract interpretation to induce general abstract transfer functions from others. 
A counting analysis is shown to be semantically correct and to have solutions 
that constitute a Moore family; a previously developed control flow analysis 
is induced from the counting analysis and its properties derived. In our view 
this development demonstrates that abstract interpretation and constraint-based 
analyses naturally complement the use of type systems for analysing calculi of 
computation. 
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Abstract. Cryptographic protocols have so far been analyzed for the 
most part by means of testing (which does not yield proofs of secrecy) 
and theorem proving (costly). We propose a new, abstract interpretation 
based, approach, using regular tree languages. The abstraction we use 
seems fine-grained enough to be able to certify some protocols. Both the 
concrete and abstract semantics of the protocol description language and 
implementation issues are discussed in the paper. 



1 Introduction 

Our goal is to provide mathematical and algorithmic tools for the analysis of 
cryptographic protocols through abstract interpretation. 

1.1 Verifying Cryptographic Protocols 

Cryptographic protocols are specifications for sequences of messages to be ex- 
changed by machines on a possibly insecure network, such as the Internet, to 
establish private or authenticated communication. These protocols can be used 
to distribute sensitive information, such as classified material, credit card num- 
bers or trade secrets, or to create digital signatures. 

Many cryptographic protocols have been found to be flawed; that is, there 
exists a way for an intruder that has gained partial or total control over the 
communication network and is able to read, suppress and forge messages to 
trick the communicating machines into revealing some sensitive information or 
believing they have an authenticated communication, whereas they are actually 
communicating with the intruder. Several tools and techniques have therefore 
been devised for analyzing and verifying the security of cryptographic protocols. 

A common feature of these techniques, including ours, is that they address the 
design of the protocol rather than the strength of the underlying cryptographic 

* This work was partially funded by NSF grant CCR-9509931. 
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algorithms, such as message digests or encryption primitives. For instance, it is 
assumed that one may decrypt a message encrypted with a public key only when 
possessing the corresponding private key. 

Whereas belief logics [3,9,8,19,20] try to deal with the rationale behind the 
design of a protocol, the other methods (theorem proving, model checking) are 
based on some kind of well-defined model of the computation [15]. The next part 
of this paper will describe the model we are considering. 

Methods based on such models can be classified into two main categories: 

1. Testing: Here a limited but wide set of possible attacks is generated and 
systematically tried against the protocol. The hope is that this set is wide 
enough so that any attack will be detected. In other words, a large subset 
of the space of reachable states of a certain configuration of a protocol is 
exhaustively explored by concrete model-checking. Efficient implementations 
have been devised [14,17,13]. 

2. Theorem proving: Here a semi-automated proof system is used; while such 
a method consumes lots of human resources, automation inside the tool can 
make it more bearable [18]. 

This paper intends to demonstrate how abstract interpretation techniques, 
and more particularly abstract model checking of infinite state computation 
systems, can be applied to the problem of analyzing cryptographic protocols. To 
our knowledge, this is the first time that an abstract domain has been proposed 
for cryptographic protocols. 

A salient point of our approach is that is fully automatic from the protocol 
description to the results. Contrary to some other methods that use abstraction, 
but require the user to design himself an abstraction or manually help a program 
to compute invariants, our method requires no user input except the description 
of the protocol and the cryptographic primitives involved. 



1.2 Abstract Interpretation 

Abstract interpretation [5,6] is a generic theory for the analysis of computation 
systems. Its basic idea is to use approximations in ordered domains in a known 
direction (lower or upper), to get reliable results. This order relation is preserved 
throughout monotonic operators. 

Here we shall approximate transition systems. We consider a transition re- 
lation r on a “concrete” state space S. We also consider an “abstract” tran- 
sition relation r** on an “abstract” state space SK An abstraction relation 
a C S X links the two spaces. By where X'^ C we mean 

{x € a \ 3x^ G X'^ a{x,x^)}. 

For instance, S could be p(Z) and the set of (possibly empty) intervals 
of Z (given by their bounds). The abstraction relation, in that example, is the 
following: 

VA G p(Z) o(A, [a, (3]) ^ A C [«,/3]. 
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We require that the two relations satisfy the following simulation condition^ 
(see Fig. 1.): 

Vx, y £ G r{x, y) A a{x, x*) G r^(x^, y^) A a{y, y^). 

This implies that for all ctq and Ug so that a(cro,crQ), noting Ag = {cr | (Tq — >■* cr} 
and aI = {(jH I afj a^, Aq C 



uo 



Ul 

I 

I 

Y 



cr 



: - cr 

I 

I 

Y 

5- 



Fig. 1. The abstract transition relation follows the concrete one. 



We are only interested in safety properties; in the concrete model we are 
considering here, liveness properties can’t be obtained, since the intruder can 
deny any network service by just stopping network transmission. To prove that 
a property P holds for all elements in Ag, it is sufficient to show that it holds for 
all elements in r“^(Ag). That will be the basic idea of our method of analysis. 

2 Concrete Model 

There exists no standard model for cryptographic protocols, although there is 
progress in the design of common notations with well-defined semantics [16,10]. 
We therefore had to provide a sensible model of what a cryptographic protocol 
is. We chose a simple syntax and a simple semantics, appropriate to describe the 
interactions between a fixed number of machines, or principals (subtler models, 
like the spi-calculus [1], could perhaps be used for successful analyses, but they 
are significantly more complex than ours). 

2.1 Terms, Rewrite Systems, and Notations 

Let us consider a signature [11, p. 249] [4, preliminaries] T and the free algebra 
of terms T{T) on that signature. Messages exchanged on the network are ele- 
ments of that algebra. We will also consider the algebra T(iF, A) of terms with 
variables in X. When t G T{T,X), (AT^igi is a family of variables, (xj)igi is a 
family of terms, we write t[xi/Xi\ the term obtained by parallel substitution of 
Xi by Xi in t. We note FV (t) the set of free variables of t. 

Let us also consider a notion of “possible computation”; this notion is defined 
by a function /C : p(T(F)) — >• p(T(F)) that computes the closure of a subset of 
T{F) by the following operations: 



^ Readers coming from a type theory background may see it as a kind of subject 
reduction property. 
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— a subset O of the function symbols found in T; that is, if the symbol / 
belongs to the subset On of elements of O of arity n, then for all n-tuple 
{xi)i<i<n of elements of fC{X), then f{x\, . . . ,x„) belongs to fC{X); 

— a set TZ of rewrite rules [11, p. 252] over T{X) of a certain kind described in 
the next paragraph. 

So an element x of T{J^) is deemed to be “possibly computable” from X C T{J^) 
if X G K.{X). We write p{T{T))]c the fixpoints of 1C. 

We require that the rules in TZ be of the following form: a — 1 x, where a is a 
term with variables over the signature T and x is a variable so that x appears 
exactly once in a. We will call such systems simplification systems. 

Example We shall consider the following signature Oc'- 

O = {pair(, ); projl(); proj2(); encrypt(, ); decrypt(, ); pk_encrypt(, ); pk_decrypt(, )} 
Oc = OU {publicO, private()} 

and the following rewrite rules: 

— projl(pair(x,y)) x, 

— proj2(pair(x,y)) y, 

— decrypt(encrypt(x. A:), fc) — >■ x, 

— pk_decrypt(pk_encrypt(x, public(A:)), private(A:)) — >■ x. 

2.2 Concrete Semantics 

Let us consider a finite set 'P of principals. Each principal p G 'P has a finite 
set Rp of registers, each containing an element of T{P) U {_L} — the _L element 
meaning “uninitialized” — and a program Xp to execute. The program is a finite 
sequence (possibly empty) of commands, which can be of the three possible 
types: 

— !t, read as “output t”, where t G T{fF,Rp); 

— r t, read as “match register r against t” , where r G {!,... ,rp} and 
t G T{T, RpiJRp)] by r we shall mean “the current contents of register r” and 
by f we shall mean “store matched value into register Rp = {f \ r G Rp} 
is a copy of Rp. 

— ?r, read as “input register r”, where r G Rp. 

We shall write h :: t the sequence whose head is h and tail t, and e the empty 
sequence. The local state of a principal is therefore the content of its registers 
and the program it has yet to execute. The global state is the tuple (indexed 
by P) of the local states, together with the state of the intruder, which is an 
element of p{T{T))x.. The set of global states is noted E. 

We define the semantics of the system by a nondeterministic transition re- 
lation — >•. Let S and S' be two global states. We note S.p the local state of the 
principal p in S' and S.I the intruder knowledge in S. In a local state L, we 
note L.r the contents of register r and L.P the program. The definition of the 
transition relation is the following: S — >■ S' if there exists po G P so that: 
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— for all p € P so that p yf po, S' .p = S.p; 

— S.po.P = h :: T and either 

— h =?ro and 

• for all r G Rp^, S' .po.r = S.po.r, 

• S'. po.ro G S.I 

• S'.po.P = T 

— h =H and 

• for all r G Rp„, S' .po.r = S.po.r, 

• S'. I = IC{S.nj{t[S.po.r/r \ r G Rp,]}) 

• S'.po.P = T 

— h = r ~ t and either 

• there exists an unifier for the variables in Rp^ between t[S.po.r/r \ 
r G Rp„] and S.po.r; then 

• for all f G Rpg \ FV (t), S'.po.r = S.po.r 

■ t[S.po.r/r I r G Rp„, S' .po.r /f \ r G = S.po.r 

■ S'.po.P = T 

• such an unifier does not exist; then 

• for all r G Rpo, S'.po.r = S.po.r 

■ S'.po-P = e 

3 Tree Automata and Operations on Them 

Regular languages, implemented as finite automata, are a well known domain 
abstracting sets of words on an alphabet. Here, we abstract sets of terms on a 
signature by regular tree languages, and we consider the generalization to n-ary 
constructors of finite automata: tree automata [4]. 

Please note that the algorithms presented here are given mainly as proofs that 
the functions described are computable. There are several ways to implement the 
same functions, and efficient implementations are likely to be more complex than 
the simple schemes given here. 



3.1 Tree Automata 

We use non-deterministic top-down tree automata [4, §1.6] to represent subsets of 
T{F); an automaton is a finite representation the subsets of terms it recognizes. 
A top-down tree automaton over iF is a tuple A = {Q,qo, A) where Q is Ei finite 
set of states, go € Q is the initial state and Z\ is a set of rewrite rules^ over the 
signature FUQ where the states are seen as unary symbols. The rules in A must 
be of the following type: 



q{f{xi,... ,x„)) f{qi{xi),... ,qn{Xn)) 

^ The reader should not confuse these rewrite rules, meant as a notation for the tree 
automaton, with the rewrite rules in TZ. 
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where n >0 f € q,qi,. . . ,qn G Q^ X\, . . . , being variables. When n = 0, 
the rule is therefore of the form q{a) — >■ a. Defining 

L,{a) = {t& T{T) I q{t) t}, 

we denote by L{a) = Lq^ (A) the language recognized by A. 

We actually will be using a narrower subclass of tree automata, which we 
be referred to as special automata, over iF; we shall note the set of these 
automata Aj=. Namely, we will require that the set A of rewrite rules defining 
the automaton can be partitioned between two subsets: 



~ rules of the form q{f{x\,... ,x„)) — >■ f{q{xi),... ,q{x„)) where q € Q 
and / G On', we require that if there exists n > 0 and / G On so that 
q{f{xi,... ,Xn)) -)> f{q{xi),... ,q{xn)) G A then Vn > 0,V/ G On,q{f{xi, 
■■■ , Xn)) ■■■ , q{xn)) G A; 

- rules of the form q{f{xi,... ,x„)) -1 f{qi{xi),... ,( 7 „(x„)) where q,qi,... , 
Qn G Q; we require the directed graph {Q, E) whose vertices are the states 
and the arrows are of the form q^EQi, for all the rules of the 

above form to be a tree. 




qo{ {x,y)) 


— 


qo{Ki) 




qo{K2) 


— 




— >A 


q 2 {“-{x,y)) 


— 


q3{Ki) 




qi{K2) 





{qi{x),q2{y)) 

Ki 

K2 

X 

Ki 

K2 



(b) The set A of rewrite rules. 



(a) The tree. The circled 
node represent the states, 
the others the symbols 



Fig. 2. An automaton {{qo , . . . , 54 }, go, A) on the signature Oc with added constants 
{X,K 4 ,K 2 } recognizing { {X, {K 4 , K 2 )), Ki, K 2 }. 
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This suggests a representation of such an automaton by a tree (see an exemple 
Fig. 2). Such a tree has two kind of nodes: 

states that have: 

— an (unordered and possibly empty) list of children, which are all symbol 
nodes; 

— a boolean flag; 

bf symbols that have an ordered list of children; there are as many as children 
as the arity of the symbol. 

The symbolics in terms of rewrite rules of such a tree are the following: 



qi 



where q, qi, - ■ ■ ,qn are states and s is a n-ary symbol stands 



Qn 

for the rewrite rule q{s{xi,. . . ,x„)) s{qi{xi ), . . . , qn(xn)); 

— the flag on a state q, when true (represented by O ) > means the set of 

rules {q(s(xi, , x„)) s(g(a^i), • ■ • , q(xn)) | s G C>„, n G N}. 

Implementing the special automata as such trees allows for easy sharing of 
parts of the data structures. 



3.2 Substitution and Matching 

We extend canonically our definition of substitution of terms into terms into 
a definition of substitution of languages (sets of terms) into terms with vari- 
ables. We furthermore overload this substitution notation to also consider a 
substitution function on automata so that for all term t and automata Ai, 
L{t[Ai/ Xi]) = t[L{Ai) / Xi]. Such a substitution function, using only special au- 
tomata, can be easily defined by induction on t. 

Now we consider the reverse problem: given a language L and a term with 
variables t, give the set of solutions of L = t[xi/Xi\. Such a solution is a family 
(Li) of languages so that L = t[Li/Xi\. We thus consider a function match so 
that if A is an automaton and t a term with variables, match{A, t) is a finite 
subset of FV{t) — >■ and for any solution S in this set, L = L{t[Si/ Xi]). A 

computational definition follows. 

We define matchi{A,t), where A = (Q,qQ,A) is an automaton and t G 
T{tF, X), recursively over the structure of t. Its value is a flnite subset of FV (t) — >■ 



— if t = s{ti, . . . ,tn) where s is an n-ary symbol, then 

matchi{A,t) = {Ax G X{U{pi \ \/l < i < n pi € matchi{{Q,qi, A),ti)} 

I r : q{s{xi,... ,x„)) -)>a s{qi{xi),... ,g„(x„)) G A}; 
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— iitaX then matchi{{Q,qo, A),t) = {[x i— 1 <7o]}- 

The interesting property of this function is that for all linear^ term t G T{T, X), 
for all automaton A = {Q,qo,A), calling xi,... ,x„ the variables in t, for all 
terms ti,... G T{T), then t[ti/xi,... Anixn] G L{A) if and only if there 
exists p in matchi{A,t) so that for all i, ti G Lp(^^.^(A)). Informally, that means 
that this function returns the set of matches of the term against the automaton, 
giving for each match and for each variable the states in which this variable is 
to be recognized in that match. 

We then construct a function match that has the same property, except that 
it does not constrain the terms to be linear. 

match{A,t) = {/ G matchi{A,t) \ 'ix G X n l,{a)^^}- 

The definition of matchi translates into an algorithm on automata defined 
by trees as above. Then match is defined, using an effective test of whether the 
languages of several automata intersect [4, §1.7]. 



3.3 The K} Function on Automata 

We want a function lO so that 1C{L{A)) C L{lO{A)) for all special automaton 
A. Actually, we shall give such a function so that there is 1C{L{A)) = L{lO{A)). 

We will use a notion of position in a term [11, p. 250] as a sequence of positive 
integers describing the path from the root of the term to that position; e will be 
the root position. pos{t) is the set of positions in term t. By t\p we shall denote 
the subterm of t rooted at position t. We define the similar notions for trees. 

Now we define completion{A, TZ) (see Fig. 3 for an example) where A is a 
special automaton and 77. is a simplification system by induction on the structure 
of A: calling qo the initial state of A and calling Ci, . . . , C„ the children states 
of qo, that is, the states two nodes away from qo'. 

construct A', obtained by replacing in A the subtree starting from Ci, . . . , 

by their image by a i— > completion{a, R) 

repeat 

for a ^ X GTZ do 

for / G match{A',a) do 

if the following subtree is not already present, modulo state renaming 

then 

copy replacing the state f{x) by qo {adds a child to qo} 

end if 
end for 
end for 

until no new subtree is added to X 
return X 

® A term is said to be linear if all variables have at most one occurrence in it. 




Abstracting Cryptographic Protocols with Tree Automata 



157 





(b) After the comple- 
tion. Not to use real e- 
transitions, we add the 
children of gi to the chil- 
dren of go- 



Ki K2 



(a) Before the completion. The dashed sub- 
tree is an expansion of paths going through 
the loops on go, for the sake of clarity. The 
dotted line is the e-transition we are adding. 



Fig. 3. Completion of the automaton from Fig. 

( {x, k), k) X. 



2 by the rewrite rule 



Termination of this algorithm is ensured by the following property, proved by 
induction on the structure of A: the set of subtrees of completion{A, TZ) is, mod- 
ulo state renaming, the set of subtrees of A. The repeat-until loop only inserts 
subtrees that were already present in A modulo state renaming, and thus ter- 
minates, since there are only a finite number of them and it never inserts twice 
the same. 

We then define 

/C**(A) = completion{A^ ,TZ) 

where A® is A where the flag on the initial state has been set to true. 
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4 Abstract Model 

The above concrete model has an annoying feature that makes it difficult to an- 
alyze: the infinite nondeterminism of the intruder (the knowledge of the intruder 
is an infinite set). We suppress that difficulty by “folding” together all branches 
of the nondeterminism of the intruder. This approximation is safe, in the sense 
that it always overestimates what the intruder knows. What then remains is a 
system of bounded nondeterminism, corresponding to the various possible inter- 
leavings of the principals. As the number of principals is finite, that gives a finite 
state space (although the number of interleavings grows fast with the number of 
principals) . 

4.1 Abstract Semantics 

An abstract global state G is made of a tree automaton SKl representing 
the knowledge of the intruder, and the local states {SKp)p^'p. Each local state 
SKp is made of a program sequence SKp.P, with the same definition as in the 
concrete semantics, and a family (S'^.p.r)rgrp of automata. 

We define the semantics of the system by a nondeterministic transition re- 
lation Let S'** and S'^ be two global states. The definition of the transition 
relation is the following: S** — >■** S'* if there exists po GV so that: 

— for all p G P so that p yf po, S'*.p = SKp; 

— SKpo-P = h :: T and either 

— h =7 To and 

• for all r G Rpg so that r yf ro, S'Kpo-r = S'^.po-r, 

• S'Kpo.ro = S.I 

• S'Kpo.P = T 

— h =H and 

• for all r G Rpg so that r yf ro, S'Kpo.r = SKpo.r, 

• S'*./ = /C#(SH./ U t[SKpo.r/r \ r G Rpg]) 

• S'Kpo-P = T 

— h = r ~ t and either 

• match{SKpQ.r, t[SKpo.r/r I r G Rpg]) yf 0 then 

• for all f G Rpg \ FV{t), S'KpQ.r = SKpg.r 

■ for all f G FV{t), S'^.p^.r = U{M.f | M G match{SKpo.r, 
t[SKpo.r/r I r G SpJ)}"* 

• S'“.po.P = r 

• match{SKpo.r, t[SKpo.r/r I r G Rpg]) = 0; then 

• for all r G Rpg, S'jJ.po-?’ = S'^.p^.r 
• S'Kpo.P = e 

^ Replacing this condition by 

3M G match{sKpQ.r,t[sKpQ.r /r \ r G SpoD ^ SE(t) S'Kpo.r = M.r 

yields a less coarse abstract model, which still has the good property that nonde- 
terminism is finite and traces length are bounded. The model we use is clearly an 
abstraction of this more precise model. 
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4.2 The Abstraction Relation 

We define an abstraction relation a C S x for any S' in i7 and S** in 

a(S, S“) (S.J C L{SKl))A\/p G V {S.p.P = e) V 

The correctness of our method relies on the fact that — is an abstraction 
of — >■ with respect to a, according to the definition in part 1.2. We prove it by a 
straightforward, albeit tedious, case analysis on the different operations. 

4.3 Where the Abstract and Concrete Models Do Not Coincide 

As we are dealing with an approximate model, it is important to know how much 
information the model actually loses. There exists a simple example in which our 
abstraction strictly overestimates the power of the intruder: a single principal A 
runs that very simple program 
?r 

!decrypt(r, K) 

and the intruder initially knows {encrypt(A, A); encrypt(T, A)}, X, Y and A 
being constants initially unknown to the intruder. We want to know whether 
at the end of the “protocol”, the intruder can get hold of the concatenation of 
X and Y. Straightforwardly, this is impossible in the concrete model, since the 
intruder has to choose what it sends to A, and cannot send both X and Y. 
However, using the abstract model, we cannot get this conclusion. 

Is this overestimation of the power of the intruder relevant when dealing 
when real-life protocols? Our investigations on examples of protocols found in 
classic papers on the topic [3] did not show it was a problem; the above kind of 
example is largely considered academic by the cryptographic protocol commu- 
nity. Furthermore, an error that exists only in the approximation for n principals 
could well be a concrete error for a greater number of principals. For instance, 
with the above example, if we run two copies of A, the intruder really can get 
X and Y . For these reasons, we think that the approximation is fine enough. 

5 Implementation Issues 

Basing ourselves on the above theory, we implemented a protocol analyzer. This 
program takes as input the signature and the rewrite system defining the term 
algebra and a specification of the protocol. 

5.1 The Program 

Our program reads an input file containing: 

— the signature of the algebra, divided between “public” and “private” con- 
structors; private constructors (like keys) cannot be applied by the intruder; 



r S.p.P = SKp.P 

(Vr G Ap S.p.r G L{S'^.p.r). 
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— the rewrite system; 

— the initial knowledge of the intruder; 

— what the intruder wants to get hold of (set L); 

— the programs run by the principals. 

It then explores the interleavings of the principal actions, computing with the 
abstract operations, and displays the interleavings that seem to exhibit a security 
hole (where the abstract knowledge of the intruder contains an element of L. 



5.2 Interleavings 

It is not necessary to consider all possible interleavings. We only consider in- 
terleavings that are concatenations of sequences of the following form: inputs 
and matches by a principal, and outputs by the same principal. It is easy to see 
that any interleaving is equivalent (when it comes to the final knowledge of the 
intruder) to such an interleaving. Our implementation therefore only explores 
the interleavings of that form. This drastically reduces the computation time. 

5.3 Implementation of the Automata 

We tried two implementations of the automata : 

— One was closely based on the operations described above on special au- 
tomata. Elementary operations, especially because of the use of hashed sets 
to test for identical branches, are very fast. The problem is that special au- 
tomata have no minimization property and the size of automata grows fast 
as the length of the traces grows. 

— The other one was operating on minimal deterministic finite tree automata 
[4] . Here, it seems that the completion by the rewriting system (implemented 
as the insertion of e-transitions and the final determinization of the automa- 
ton) is very slow. 

We also investigated whether some available toolkits such as MONA [12] and 
BANE [7], but didn’t succeed in using any of those for our particular needs. The 
MONA application programming interface is geared towards WS2S logic appli- 
cations and handling already computed automata is difficult; on the other hand, 
BANE is more geared towards computations on sets of terms, but it seemed 
that some useful features were either missing or difficult to implement with- 
out knowing the internals of the library. We are also considering other possible 
implementations based on constraint solving [2]. 

The experimental results we obtained suggest the replacement of the rewrit- 
ing system by completion through a system of rules, which is computationally 
less expensive. This needs some slight changes in the semantics. An implemen- 
tation is under way. 
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6 Experimental Results 

We used the above implementation on some examples, some of which academic 
samples, some of them real protocols from the standard papers on the topic. 

6.1 Trials on Small Examples 

We first experimented our analyzer (computing on special automata) on some 
small examples, among which: 

— a single run of the Otway- Rees protocol [3] ; 

— the “Test n” examples: n principals running each the program: ?r 
decrypt(r,Rr„) 

the initial knowledge of the intruder being encrypt(- • • (encrypt(AT, Ki), . . . ,a 
lowbreakKn)] the unknown piece of data the intruder tries to recover being 

a:. 

Here are some recorded times and memory footprints: 



Example 


Pentium II 450 MHz 


Sun Ultra 1 


Memory used 


Otway-Rees, 


1 run 


21 s 


106 s 


10 Mb 


Test 6 




3 s 


11 s 


1 Mb 



Alas, while other protocols [3,9], when using similarly small number of prin- 
cipals, have been easy to analyze using the program, bigger examples (like two 
parallel runs of the Otway-Rees protocol) have made the computation times 
explode. 

6.2 An Interesting Point on the Otway-Rees Protocol 

An early trial of our program on the Otway-Rees protocol [3] yielded surprising 
results. This protocol features a principal A running: 

!pair(A, pair(R, pair(M, encrypt(pair(Aa, pair(M, pair(A, R))), Kas)))) 

?r 

r ~ pair(R, pair(A, encrypt(pair(Aa, kab),Kas))) 

!encrypt(A, kab) 

The secret piece of data is X. After these four steps, the intruder can in- 
deed get X in the following way: at step 2, the intruder sends pair(R, pair(A, 
encrypt(pair(iVo, pair(A, R)), Alas))), built from pieces of the message output by 
A at step 1. A will then use pair(A, R) as kab- On the other hand, reorganizing the 
output from step 1, replacing pair(Aa, pair(M, pair(A, R))) by pair(pair(Aa, M), 
pair(A, R)), prevents this attack, and the analyzer then concludes that the pro- 
tocol is safe. 

Whether or not the bug described above is relevant in real implementations 
depends on how certain primitives, notably pairing, are implemented. Models 
taking associativity and commutativity into account could perhaps be more suit- 
able for analyses of such properties. 
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7 Conclusions and Prospects 

We proposed a model based on tree automata to abstract cryptographic proto- 
cols. We implemented our algorithms and were able to successfully and correctly 
analyze some small instances (2 principals and 1 server) of well-known protocols 
and test examples. Our abstraction is fine-grained enough to yield successful 
result on real-life protocols. 

On the other hand, the complexity of the computation quickly rises as the 
number of simulated machines grows. It also seems difficult to accommodate 
well certain properties like associativity or commutativity of operators. A more 
annoying drawback of our current concrete model is that the number of sessions 
and principals is fixed. We intend to investigate how to extend our approach 
to a model allowing an arbitrary number of sessions to be created, such as the 
spi-calculus [1]. 

It seems that the inefficiencies in our present implementation are essentially 
caused by the need to work with the rewriting system. We are currently working 
on a related semantics removing the need of such system (replacing it by pattern 
matchings) and we hope that performance will then be more adequate. 

References 

1. Martin Abadi and Andrew D. Gordon. A calculus for cryptographic protocols: The 
spi calculus. Research report 149, Compaq Systems Research Center, Palo Alto, 
CA, USA, Jan 1998. 

2. Alexander Aiken and Edward L. Wimmers. Solving systems of set constraints 
(extended abstract). In Proceedings, Seventh Annual IEEE Symposium on Logic 
in Computer Scienee, pages 329-340, Santa Cruz, California, 22-25 June 1992. 
IEEE Computer Society Press. 

3. Michael Burrows, Martin Abadi, and Roger Needham. A logic of authentication. 
Technical Report 39, Digital Equipment Corporation, Systems Research Centre, 
February 1989. 

4. Hubert Comon, Max Dauchet, Remi Gilleron, Denis Lugiez, Sophie Tison, and 
Marc Tommasi. Tree Automata Techniques and Applications. Available through 
the WWW. In preparation. 

5. Patrick Cousot. Methodes iteratives de construction et d’approximation de points 
fixes d’operateurs monotones sur un treillis, analyse semantique de programmes. 
These d’etat es sciences mathematiques, Universite scientifique et medicale de Gre- 
noble, Grenoble, France, 21 mars 1978. 

6. Patrick Cousot and Radhia Cousot. Abstract interpretation and application to 
logic programs. J. Logic Prog., 2-3(13):103-179, 1992. 

7. Manuel Fahndrich. BANE: Analysis Programmer Interface. Computer Science 
Departmene, University of California at Berkeley, 1998. 

8. Li Gong. Cryptographic Protocols for Distributed Systems. PhD thesis. University 
of Cambridge, Cambridge, England, April 1990. 

9. Li Gong, Roger Needham, and Raphael Yahalom. Reasoning about belief in cryp- 
tographic protocols. In IEEE Symposium on Research in Security and Privacy, 
pages 234-248, Oakland, California, May 1990. IEEE Computer Society, IEEE 
Computer Society Press. 




Abstracting Cryptographic Protocols with Tree Automata 



163 



10. Jean Goubault-Larrecq. Clap, a simple language for cryptographic protocols, avail- 
able on the WWW, 1999. 

11. Jean-Pierre Jouannaud and Nachum Dershowitz. Rewrite systems. In Jan van 
Leuween, editor, Handbook of Theoretical Computer Seience, volume B. Elsevier, 
The MIT Press, 1990. 

12. Nils Klarlund and Anders Mpller. MONA version 1.3: User Manual. BRICS, 
University of Aarhus, 1998. 

13. G. Lowe and B. Roscoe. Using CSP to detect Errors in the TMN Protocol. IEEE 
Transaetions on Software Engineering, 23(10):659-669, 1997. 

14. W. Marrero, E.M. Clarke, and S. Jha. Model checking for security protocols. 
Technical Report CMU-SCS-97-139, Carnegie Mellon University, May 1997. 

15. Catherine Meadows. The NRL Protocol Analyzer: An Overview. Journal of Logic 
Programming, 1995. To appear. 

16. Jonathan K. Millen. CAPSL: Common authentication protocol specification lan- 
guage. available on the WWW. 

17. J.C. Mitchell, M. Mitchell, and U. Stern. Automated analysis of cryptographic 
protocols using murphi. In IEEE Symp. Security and Privacy, pages 141-153, 
Oakland, 1997. 

18. Lawrence C. Paulson. Proving properties of security protocols by induction. In 10th 
Computer Security Foundations Workshop, pages 70-83. IEEE Computer Society 
Press, 1997. 

19. P. Syverson. Adding time to a logic of authentication. In 1st ACM Conference on 
Computer and Communieations Security, pages 97-101, 1993. 

20. P. Syverson and P. C. van Oorschot. On unifying some cryptographic protocol 
logics. In 1994 IEEE Computer Society Symposium on Researeh in Security and 
Privacy, pages 14-28, May 1994. 




State Space Reduction Based on 
Live Variables Analysis 



Marius Bozga^, Jean-Claude Fernandez^, and Lucian Ghirvu^* 

^ VERIMAG***, Centre Equation, 2 avenue de Vignate, F-38610 Gieres 
Marius . Bozga@imag.fr , Lucian. GhirvuSimag . f r 
^ LSR/IMAG, BP 82, F-38402 Saint Martin d’Heres Cedex 
Jean-Claude .Fernandez@imag.fr 



Abstract. The intrinsic complexity of most protocol specifications in 
particular, and of asynchronons systems in general, lead us to study 
combinations of static analysis with classical model-checking techniques 
as a way to enhance the performances of automated validation tools. 
The goal of this paper is to point ont that an equivalence on our model 
derived from the information on live variables is stronger than the strong 
bisimulation. This equivalence, further called live bisimulation, exploits 
the unused dead values stored either in variables or in queue contents 
and allow to simplify the state space with a rather important factor. Fur- 
thermore, this reduction comes almost for free and is always possible to 
directly generate the quotient model without generating the initial one. 

Keywords: model checking, state space reduction, bisimulation, asyn- 
chronous communication, live variables analysis 



1 Introduction 

Formal Description Techniques such as LOTOS [16] or SDL [17] are now at the 
base of a technology for the specification and the validation of telecommunication 
systems. This is due not only to the fact that these formalisms are promoted by 
ITU and other international standardization bodies but also to the availability 
of mature commercial tools, mainly for editing, code generation and testing. 

Alternatively, we have been developing for more than ten years a set of tools 
dedicated to the design and validation of critical systems and based on the model 
checking paradigm [22,8]. One of them is the model checker ALDEBARAN [7] 
maintained and distributed in collaboration with the VASY team of inria Rhone- 
Alpes as part of the CADP toolset [11]. Another one is the test sequence generator 
TGV [13], built upon CADP and jointly developed with the pampa project of IRISA. 

The central problem arising in the context of model based validation and 
implicitly for the above mentioned tools is the well known state explosion prob- 
lem. To deal with it, we begin more recently to investigate alternative program 
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representations and more important, ways to adapt techniques issued from other 
advanced domains such as compiler design and optimization in the context of 
model checking. In this respect, we developed if [5] which is an intermediate pro- 
gram representation based on asynchronously communicating timed automata. 
IF was designed on one hand to be able to represent significant subsets of SDL 
and LOTOS and on the other hand to support the application of static analysis 
techniques used in compiler optimization [2,21]. In particular, a translation from 
SDL into IF is already implemented using the SDl/api interface provided by the 
industrial tool O&jecteEODE to its SDL compiler. 

In general, model checkers and in particular ALDEBARAN and TGV are based 
on the central notion of bisimulation [20]. In fact, either in the verification process 
or in the test generation process there is usually a step of minimization modulo 
strong bisimulation. This lead us to consider static analysis techniques for if 
programs in the context of bisimulation equivalences. 

The main goal of this paper is to point out that an equivalence on our model 
derived from the information on live variables is stronger than the strong bisim- 
ulation. This equivalence, further called live bisimulation, exploits the unused 
dead values stored either in variables or in queue contents and allow to sim- 
plify the state space with a rather important factor. Furthermore, this reduction 
comes almost for free and is always possible to directly generate the quotient 
model without generating the initial one. 

The idea of using static analysis to improve model checking was already being 
investigated in different particular contexts. For instance, in [10] was proposed 
a method to reduce the number of clocks in timed automata using live clocks 
and clocks equality analysis. In [18] was given a method which combines partial 
order reductions and static analysis of independent actions for SDL programs. 
An important work was done to find efficient representations of possible infi- 
nite queue contents and to exploit the static control structure when performing 
reachability analysis [4,1]. However, at the best of our knowledge we are the 
first to make use of live variables to simplify the state space, including queue 
contents, of asynchronous systems with queue-based communication. 

The paper is structured as follows. Section 2 presents the underlying model 
which is parallel processes asynchronous communicating via queues. Section 3 
briefly remember the notion of live variables and some basic properties about 
them. In section 4 we introduce the live equivalence relation on states and show 
that is a bisimulation. An efficient way to identify live equivalent states using 
a canonical form is then presented in section 5. Finally, in section 6 we discuss 
the general utility of introduced equivalences in the context of model-checking. 
Some practical results obtained on a small example are given in section 7. 
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2 The model 

2.1 Syntax 

We consider systems consisting of asynchronous parallel composition of a number 
of processes that communicate through parameterized signal passing via a set of 
unbounded fifo queues and operate on a set of shared variables. Formally, a 
system is a tuple: 

n 

P ::= (S,X,C,YIp^) 

i=l 

where S is the set of signals, X is the set of variables, C is the set of queues 
and {pi}j=i,n are processes. Processes perform actions on queues and variables. 
They are described by terms of a simplified value-passing process algebra with 
the following syntax: 

p ::= nil \ a.p \p + p\ Z{e) 

This syntax contains the usual operators from process algebra: nil denotes the 
empty process, a.p denotes p prefixed with action a, pi + P2 is the nondeter- 
ministic choice between pi and p2, and finally Z(e) stands for recursion with the 
value passing of the expression e. To give a semantics for terms we assume also 
the existence of a set of declarations: 

Z{y) <def P 

for each name Z which occurs in terms. Note that in our algebra such declaration 
binds the occurrences of the variable y inside p to Z, therefore y is considered 
a parameter of the definition of Z. The actions are simple guarded commands 
defined by the following syntax: 

a ::= [b] a\ 

ai ::= x := e \ c!s(e) | c?s{x) 

where [ b ] with b a boolean expression denotes the guard, x := e denotes an 
assignment of the variable x to the expression e, c!s(e) denotes the output to 
the queue c of the signal s with parameter the expression e and finally, c 7 s{x) 
denotes the input from the queue c of the signal s and store of its parameter in 
the variable x. 



2.2 Semantics 



We give the semantics of systems in terms of labeled transition systems. We pro- 
ceed in two steps: we start with the interpretation of the control part only, then 
we give the interpretation of control combined with data, i.e. queues contents 
and variables. 



The control semantics is described by the following rules which give a way to 
build the control graph of the system, from the parallel composition of processes. 

n 



The states in this graph are tuples of terms Po the transitions are labeled 



by actions, as follows: 
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a.p — >p 



P — >P 

p + q — yp 



q — >q 

p + q — yq' 



p[e/y]^p' Z{y)<idefP 

Z{e)^p' 



n n 

np^^Up' 



Note that, in this paper we restrict our attention to systems where the control 
graph is finite, that is if it contains only a finite number of states and transitions. 

In order to interpret the data part we assume the existence of the universal 
domain D which contains the values of variables and signal parameters. We 
suppose that the boolean values {true, false} and also the special undefined _L 
value are contained in D. We define variable contexts as being total mappings 
p : X ^ D which associate to each variable x a value v from the domain. We 
extend these mappings to expressions in the usual way. We define queue contexts 
as being also total mappings S : C ^ {S x D)* which associates to each queue 
c a sequence (si,vi), (sk,Vk) of messages, that is pairs (s,v) noted also by 
s{v), where s is a signal and v is the carried parameter value. We assume also 
the existence of some special undefined message a. The empty sequence is noted 
with e. 

The semantics of a system is now completed by the following rules which give 
a way to build a labeled transition system based on the control graph. States 
of this system are triples of the form (p,S,p), where p is a variable context, <5 
is a queue context and p is a control state. Transitions are either internal and 
labeled with r, when derived from assignments or signal inputs, either visible 
and labeled with c!s(u) when derived from signal outputs: 



{p,5,p)^{p[v/x\,5,p') 



{p,5,pf^ {p,5[w.s{v) / c],p') 



p{b) = true 5(c) = s{v).w 
{p,S,p)^{p[v/x],S[w/c],p') 
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3 Live variables analysis 

In this section we briefly remember the deflnition of live variables and some 
general properties about them. We consider the sets of variables used and re- 
spectively defined by some action a. Intuitively, a variable is used either in the 
guards, in the right hand side of assignments or in outputs. A variable is defined 
either in the left hand side of assignments or in inputs. Formally, the sets Use{a) 
and Defa) are defined as follows: 



a 



Use{a) Def{a) 



[b] X := e vars{b) U vars{e) {x} 

[ b ] c!s(e) vars(b) U vars{e) 0 

[ b ] cls{x) vars{b) {x} 

We consider now the set of live variables for some term p. Intuitively it is the 
smallest set of variables which might be used before they are redefined when 
interpreting the term. Formally, the sets Live{p) are defined as the least flxpoint 
solution of the following equation system over sets of variables: 



Live{nil) 0 

Live{a.p) Use(a) U Live{p) \ Defa) 

^ Live{p + q) Live{p) U Live{q) 

Live{Z{e)) Live{p[e/x]) where Z{y) <def P 

n n 

Live{Y{ Pi) =u U Liveipi) 

^ i—1 i—1 

An equivalent characterization for the sets of live variables based on the control 
graph transitions and some basic properties are given by the following lemma. 

Lemma 1. 

1. Live{p) = U Use{a) U Live{p')\ Defa). 

p-^p' 

^ p — >p' Use{a) C Live{p) 

Live{p') \ Defa) C Live{p) . 



Proof. 1. Structural induction over term structure. 2. Immediate from 1. 
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4 Live equivalence 



First, we consider the variable context equivalence relation induced by the set 
of live variables: two variable contexts are equivalent if and only if the values 
assigned to each one of the live variables are pairwise identical. Formally, for 
each state p we define the variable context equivalence as follows: 

Pi P2 ^ 'ix € Live{p) pi{x) = p2{x) 

We consider similar equivalence relations defined for queue contexts. Intuitively 
two queue contexts are equivalent if and only if they enable the same sequences 
of transitions and if for each enabled input the parameter values are identical 
if the receiving variable is live at the next state. Formally, we define the queue 
context equivalences to be the greatest fixpoint solution of the following 

equation system where for each state p we have: 

«p/ 62 
Kip, S2 

< 5 i(c) = e A 62(c) = e A 
( 5 l Kip, 62 

V 

< 5 i(c) = s(ui).rci A 62(c) = s(v2)-W2 A 
(x € Live(p') ^ v\ = V2) A 

6i[wi/c] Kip, 62 [w 2 /c] 

V 

< 5 i(c) = Si(ui).Wi A 62(c) = S2(V2)-W2 A 
Si ^ s A S2 yf s 

If the control graph is finite, the existence of the greatest fixpoint satisfying the 
equations above is ensured by the Tarski’s fixpoint theorem given the mono- 
tonicity with respect to the relation inclusion at each state. 

We define now the live equivalence relation over global states: two states 

are equivalent if the control states are identical and both the variable context 
and the queue context are equivalent: 

(Pi, 5 i,p)«'“^(P 2 ,< 52 ,p) ^ Pi^^rP2 A 

The next theorem gives the central result of the paper, that is, the live equiva- 
lence is also a bisimulation relation over the global states. 

Theorem 1. The live equivalence is a bisimulation. 




Proof. By definition, is a bisimulation if and only if for all pairs 

(pi,6i,p) Ki‘^'"^ (p2,62,p) we have: 
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'^{P2,S2,P)-^{P2,S'2,P') ^ 3{pi,Si,p)-^{p[,6[,p') A {p[,6[,p) {P2,S'2,P) 

Let (pi, (p 2 , S2,p) and let p-^p' . We distinguish three different cases, 

depending on the type of a. 

1. Of = [ & ] a; := e 

Vz = 1,2 pi{e) = Vi, pi{h) = true ^ 

{pi,5i,p)^{p'^,5^,p'), p' = p,[wi/a;] 

Pi~p”®P2 ^ pi{h) = p2{h), pi(e)=p2(e), pi [z;i/x] P2[f2/a;] 

5i^f^52 ^ 5^^^P^^52 

2. a=[b] c!s(e) 

Vz = 1,2 Ji(c) = Wi, pi{e) = Vi, pi{h) = true ^ 

{Pi,h,pf'-^\pi,^'i,p'), 5[ = 5^[w^.s{vi)lc\ 

Pi~p”®P2 ^ pi{b) = p2{h), pi(e)=p2(e), pi rho2 

(5i ^2 ^ (5i (52 ^ (5i[ici.s(z;i)/c] 52 [w2-s(w2)/c] 

Note that the second implication comes from the more general property of 
queue content equivalence to be closed under output operations. That is, it 
can be proven inductively that, two equivalent queue contents at some state 
p, are still equivalent after adding the same message to the same queue in 
both of them. 

3. a = [b] c?s(x) 

Vi =1,2 Si(c) = s(vi).Wi, Pi (b) = true 

(pi,Si,p)^(p-,S'i,p'), p'i = p^[vi/x], 5[ = 5i[wrlc] 

5i 82 ^ ui = V 2 if X G Live{p'), 5i[wi/c] <52 [ic2/c] 

Pl~p”®P2 ^ Pl{b) = p2{b), Pi[vi/x]^^^r P2[v2/x] 

An immediate consequence of the previous theorem is the following one, stating 
that the live equivalence is stronger than the strong bisimulation. 

Theorem 2. C 

5 Reset equivalence 

We would like to have a more efficient way to check that two contexts are live 
equivalent than to directly check the definition. Here we investigate the possi- 
bility to transform contexts into some canonical form preserving the live equiv- 
alence. One way to do this is using the family of Reset functions, defined below. 

The Resef function on variable contexts p basically sets the value of the 
dead variables to the fixed undefined T value, depending on the current control 
state p. A reset equivalence can be defined over variable contexts as follows: two 
contexts are reset equivalent if they give the same result when reseted. Formally, 
we have: 
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Resef {p, p) = p* p*{x) 



j p{x) if a; G Live{p) 
\ _L otherwise 



p-^ r^reset p^ RcScf {p, Pi) = RcScf {p, P2) 

It is straightforward to prove that the live equivalence and the reset equivalence 
on variable contexts are identical, that is, the following lemma holds. 



Lemma 2. Vp ^reset 



^live 

P 



In order to define a similar Reset‘S function for queue contexts we start by intro- 
ducing some auxiliary notations. 

Let consider some fixed queue c. We define the relation over control states to 
include all the control transitions, except the ones labeled with inputs from the 

T*7 

queue c. We note with its transitive and reflexive closure and with Post^*^ 
the post image function defined over sets of control states, formally: 
p-^p' 3a [b] c?s(x) p-^p' 



Post^iQ) = {p' \3peQ p^p' } 

Let consider a = [b ] c?s(x) to be some input action of signal s from queue c. 
We note p=^p' if it is possible to reach p' from p by a sequence ending with a 
and which does not contain any other input from the queue c. We define also the 
post image function Post^ over sets of control states, that is, it gives all the 
states which can be reached after consuming an s from the queue c. Formally, 
we have: 



[fe] c?s(x) 
p ^ p 



^ 3p" p^p" A 
Post^(Q) = { p' \ 3p e Q 



P'} 



We define now the local reset function for the queue c given an initial non- 
empty set of control states Q. That is, given the content w of the queue c, this 
function basically rewrite it and forgot the signal parameters which statically are 
detected to be unused, when execution starts somewhere in Q. With cr denoting 
the undefined message, the function is recursively defined as follows: 

(a if Vs Postg^{Q) = 0 
reset{Q, c, e) = < 

I e otherwise 



reset{Q, c, s{v).w) = < 



C cr if Postg^{Q) = 0 
s{v).reset{PostcT, (Q), c, w) 

[fc] C?S' 



if 3p G Q, p 



A a; G Live{p') 



s{3-).reset{Postc?3 (Q), c, w) 

if Vp G Q, => X ^ Livefp') 
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Some interesting properties about the local reset function are given by the fol- 
lowing lemma. 

Lemma 3. For any Q, c, w, w\, W2 holds 

1. reset{Q,c,w) = reset{Post^»^{Q),c,w). 

2. reset{Q,c,wi) = reset{Q,c,W2) 

\/Q' C Q reset{Q' ,c,wi) = reset{Q' , c,W2) ■ 

Proof. 1. The proof is immediate from the fact that for any set of control states 
Q, channel c and signal s we have: 

Q C Post.^*^ {Q) Postg^{Q) = Post^{Post^*^ (Q)) 

2. The proof can be done by induction on the maximal size of the sequences wi 
and W2- 



We define the Reset‘S function and the reset equivalence on queue contexts using 
local reset functions defined before, that is, for a given control state p we consider: 

Resef{p, (5) = <5* i5*(c) = reset{{p}, c, <5(c)) 

(5i ^2 ^ Resef{p,Si) = ReseP{p,62) 

The following lemma gives the relation between the live equivalence and the reset 
equivalence on queue contexts. In general, the reset equivalence is stronger than 
the live equivalence. This is explained by the fact that local resets are based on 
a conservative assumption about other queues: the inputs from other queues are 
always considered enabled. However, in the special case when the input actions 
from a queue do not enable input actions from other queues, the live equivalence 
and the reset equivalence become identical. This can be formalized as follows. 
For a an input action, we note if p' is reachable from p by a sequence 

ending with a and which does not contain other inputs. We define now queue c 
to be reset-independent from queue c' if and only if : 



\/p,q,q' p 



[b'l 



s'(x') [b]cfs 

>* q, q = 



(x) f 

q 



/ [ ^]c?g (a:) ^ f[b ] c ?g {x ) ^ ( f\ j. ^ T ■ ( 

dp p =>» p , p =>* q , X G Live[p ) ^ x G Live[q ) 



Lemma 4. 



1. Vp C 

2. if the queues are reset-independent then 

^reset ^live 

yp 

Proof. 1. The proof consist to check that the reset equivalences satisfies 

the fixpoint equations defining the live equivalence. This can be easily done given 
the results established by the previous lemma. Then because the live equivalences 
are the greatest fixpoint is clear that control states p. 

2. Let i5i §2 and the queue c fixed. We will prove inductively that 
reset{{p},c,6i{c)) = reset{{p},c, 62(c)). We distinguish the following cases: 




State Space Reduction Based on Live Variables Analysis 173 



1. (5i(c) = e, 62(0) = e 

reset{{p} , c, Si{c)) = reset{{p} , c, S2{c)) 

2. (5i(c) = e, 52(c) = S2{v2)-W2 

(a) Vs Posi^({p}) = 0 

reset({p}, c, 5i(c)) = reset({p}, c, 52(c)) = cr 

(b) 3 s Postc?s ({p}) ^ 0 



^ rc?s Tc?. Tc?^ [fc] C?S(X) 

=l>3p = po >Pl 1... %Pn p 

and because the inputs are independent we can choose this sequence such 
that it doesn’t contain inputs from other queues 

we obtain a contradiction with the definition of the live equivalence 
hypothesis, that is 5 i 9^^"® 52 

3 . 5 i(c) = si(wi).u;i, 52(c) = S2{v2)-W2 

(a) Post^dp}) = 0 , Post^({p}) = 0 

reset({p}, c, 5 i(c)) = reset({p}, c, 52(c)) = cr 

(b) Postc^{{p}) ^ Postc^{{p}) = % 



= 1 > contradiction with the live equivalence, as before. 

(c) Post^dp}) 0 , Posp^({p})y ^0 

i. Si ^ S2 

= 1 > contradiction with the live equivalence, as before. 



ii. 



Si = S2 = s and v\ ^ V2 and 3p^ V 2 ; € Live{p') 
=1> contradiction with the live equivalence, as before. 



iii. Si = S2 = s and v\ = V2 = v ot: Vp =A p' x ^ Live{p') 
=> reset({p}, c, si(wi).wi) = s{v).reset{Postc?3 {{p}), c, wi) 
reset({p}, c, S2(u2).rc2) = s{v).reset{Postgu{{p}), 0,102) 
and from the live equivalence hypothesis we have also that 
Vp' G Post^{{p}) 5 i[wi/c] 52 [w2/c] 
so we can inductively infer that either contradiction or 
reset({p},c, si(wi).wi) = reset{{p},c, 82(02) .102) 



Finally, we define the reset equioalence over global states as follows: 
(Pi,5i,p)«--*(p2,52,p) ^ A 5i«--'52 

The link between the live and reset equivalence over global states is established 
by the following theorem. 

Theorem 3. 



^ ^reset ^ ^live 

2. if the queues are reset-independent then: 

^reset ^live 



Finally, note that SDL systems satisfy the independence hypothesis as they are 
composed from parallel processes, each one having its own unique input queue. 
Thus, performing a signal input in a process does not interfere with other possible 
inputs in other processes. 
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6 Live analysis and model checking 

The reductions based on live or reset equivalence can be obtained with almost 
no cost and are fully orthogonal to other techniques applied in the context of 
model-checking validation to deal with the state-explosion problem. 

The weak cost of the live equivalence reduction is due to the static analy- 
sis approach: live variables need to be computed once in the beginning with an 
algorithm operating on the control graph and then are used by primitives such 
as Reset which operate on states content and which have a linear complexity 
with respect to the size of the state (the number of variables plus the number of 
messages in the queues). In fact, in the context of exhaustive state space explo- 
ration algorithms such primitives can be viewed as having constant operation 
time, similar to operations like state copying or state comparison. 

Live equivalence reduction can be combined with techniques ranging from 
the simplest model generation, with standard on the fly verification methods 
and even with more sophisticated partial order or symbolic techniques. 

Enumerative simulation is at the basis of most of the verification algorithms. 
It is always possible to directly generate the quotient graph with respect to 
live or reset equivalences, thus preserving all the observable behaviors, without 
constructing the initial one. This can be easily done for instance by considering 
every time a new state is generated its canonical form given by Reset. Reset 
equivalent states will be automatically identified as equals and explored once by 
the generation algorithm. 

On the fly verification techniques such as [12] rely on enumerative depth first 
traversals of the model state space and the meantime evaluation of the property 
(e.g, temporal logic formula, observer automaton, test purpose). In this case too, 
the property can be directly evaluated on the reduced graph, if it is not explicitly 
dependent on state variables or queue contents but only on the observable actions 
of the system (e.g, outputs). Note that, if such dependencies exist, they can be 
normally removed by introducing auxiliary observable actions and by modifying 
the property accordingly. 

Partial order techniques [15] reduce the state space by avoiding to explore 
interleaving of independent actions. Clearly, the live equivalence reduction can 
be further considered when new states are encountered, as before. 

In the symbolic model checking context [19] the information on live variables 
can be used to simplify the BDDs representing the transition relation and the 
sets of states. In fact, dead variables for a given control state provide non-trivial 
don’t care conditions which can be exploited either directly to simplify any final 
or intermediate result, or even better, to improve primitives like successors or 
predecessor computation on symbolic representations. 

In particular, live equivalence reduction can also be applied at each step of 
the minimal model generation algorithm [14] which involves the (backward) com- 
putation of predecessor states. This must be very important as the predecessor 
computation usually gives a lot of spurious unreachable states and which might 
be equivalent w.r.t. the live equivalence. 
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7 Example 

We illustrate the potential reduction obtained using the live equivalence on 
a small example. We consider two simple processes communicating through a 
queue as shown in figure 1. The first process only send arbitrarily request and 
switch messages. It is described by Zj^ummy and the following equation: 



Zduramy <def {Mswitch + in\request{*)) .Z dummy 



The second process is more interesting. It has two functioning modes, a normal 
one and a fault one. Basically, it inputs requests from the in queue and delivers 
them, when normal, and looses them otherwise. The mode changes from normal 
to fault (and backward) when input a switch signal. Formally, we consider this 
process described by Z normal and the following equations: 

Znormal ^def ini rCqUCStiyXfZ deliver ini SWitch.Z J unit 

Z deliver <def OUt\reqUest{x) .Znormal 
z fault <def ini request{x).Z fault + ini switch. Znormal 




We can easily check that the variable x is live only at the deliver state. Thus, 
the value of x must not be used to distinguish between states when somewhere 
else that deliver. Furthermore, it can be seen that when at the fault state, the 
parameters of the incoming requests are not live too. That is, request signals 
are only consumed, without using the carried values. It can be seen also that 
signals of the out queue are never consumed so we don’t distinguish states which 
differs on the out content. Such cases are all captured by the live equivalence as 
defined in section 4. 
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Figure 2 shows the reduction obtained for this example with respect to two 
parameters: m, the size of the domain of x and n, the maximal allowed size of 
queue in. The continuous lines give the real number of states and the dotted lines 
give the number of states obtained after considering live equivalence reduction 
and please notice that an logarithmic scale was used for the number- of- states 
axis. Also, to better illustrate the reduction obtained, note that for instance 
when TO = 6 and n = 5 the initial number of states is 352944 and it is reduced 
at 89824, so with a factor up to 75%. 





m=3 n=l:I0 m=l:10 n=3 



Fig. 2. Live equivalence reduction 



8 Conclusion and future work 

The essence of this work is to show that the live variable analysis define an 
equivalence stronger than bisimulation equivalence. This allow to simplify the 
state space exploiting the information on unused dead values stored either in the 
variables or in the queue contents. This idea was already experimented to the 
industrial case study SSCOP [6] where we have obtained impressive reductions 
of the state space up to 200 times. 

In the context of model-based validation, the main interest of static analysis 
is to reduce the state space, which is crucial in practice to deal with complex 
specifications. Model checking and automatic test generation for conformance 
testing are based on the same algorithmic techniques: given a specification and 
a property, they compute, in the first case the validity of the property and, in 
the second case, the test case with respect to either the specification and the 
property. We are currently developing if [5] , a toolbox for a high level represen- 
tation of programs, especially for telecommunication specification languages. In 
this toolbox, static analysis is intensively used in the earlier stages of validation 
in order to reduce the memory costs of the next ones. 
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This lead us to consider two classes of analysis: property independent analysis: 
(such as live variable analysis or constant propagation) without regarding any 
particular property, the analysis is implemented by a module which takes as input 
and output a intermediate program, property dependent analysis: the analysis 
takes into account some information extracted from the property or from the 
environment and propagate them over the static control structure of the program 
(such as uncontrollable variables abstraction [9]). 

We envisage to experiment more sophisticated analysis, such as constraints 
propagation in the context of symbolic test generation. We also want to exploit a 
connection with the tool invest [3] , which computes abstractions and invariants 
on a set of guarded commands. 
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Abstract. This paper presents an efficient and effective code optimiza- 
tion algorithm for eliminating partially dead assignments, which become 
redundant on execution of specific program paths. It is one of the most 
aggressive compiling techniques, including invariant code motion from 
loop bodies. Since the traditional techniques proposed to this optimiza- 
tion would produce the second-order effects such as sinking-sinking ef- 
fects, they should be repeatedly applied to eliminate dead code com- 
pletely, paying higher computation cost. Furthermore, there is a restric- 
tion that assignments sunk to a join point on flow of control must be 
lexically identical. 

Our technique proposed here can eliminate possibly more dead assign- 
ments without the restriction at join points, using an explicit represen- 
tation of data dependence relations within a program in a form of SSA 
(Static Single Assignment). Such representation called Extended Value 
Graph (EVG), shows the computationally equivalent structure among 
assignments before and after moving them on the control flow graph. We 
can get the final result directly by once application of this technique, 
because it can capture the second-order effects as the main effects, based 
on EVG. 



1 Introduction 

Dead code elimination [1] has been used to improve efficiency of program execu- 
tion by eliminating unnecessary instructions. If a variable at the left-hand side 
of assignment is not used in the continuation of a program, such assignment can 
be eliminated. This is called totally dead assignment. However, as illustrated by 
y := a + b at node 1 in Fig. 1(a), if a variable at the left-hand side is not used 
on the left branch but used on the right, it cannot be eliminated directly. In 
this case, such assignment can be transformed to be totally dead at node 2 by 
sinking y := a + b from node 1 to the entry of node 2 and 3. After that, the 
original assignment can be eliminated as shown in Fig. 1(b). This assignment is 
called partially dead assignment and this elimination technique is called partial 
dead code elimination (PDF) [11]. 

The concept of PDF also includes loop-invariant code motion as shown in 
Fig. 1(d), where y := a + bin Fig. 1(c) is a partially dead assignment because the 
variable y is dead on the back path to the beginning of loop body. 
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(a) (b) (c) (d) 



Fig. 1. Partial dead code elimination and loop-invariant code motion 



In previous works [8,11,14], all assignments sunk to a join point are required 
to be lexically identical, i.e. having the same left-hand side variable, the same 
operators and operands. This restriction forbids some attractive optimizations. 
In Fig. 2(a), x := a + h &i node 2 and a; := a -I- c * 3 at node 3 have the same 
variable at left-hand sides, but are not same in their right-hand sides. Thus, 
sinking to node 4 of these assignment statements are blocked. If a temporary 
variable for the subexpression c*3ina::=a-|-c*3is renamed to the same name 
as variable b, x := a + b can be sunk to node 46 after y := x a,t node 4 has been 
sunk as shown in Fig. 2(b). 

We propose an efficient technique to eliminate partially dead assignments in- 
cluding such renaming effect for operands. This effect is systematically obtained 
by transformation of our graph representation, called as extended value graph 
(EVG)[17j. 

Our partial dead code elimination is realized by dataflow analysis on both 
control flow graph (CFG) and EVG. This analysis also contributes to reducing 
optimization cost. PDE consists of two steps, i.e. sinking and elimination of 
assignments, both of which can mutually influence each other. This means that 
each step may cause second order effects to another step. To complete all possible 
eliminations, PDE must cover the following four types of second order effects [11]: 

Sinking-Elimination Effects: An assignment is sunk until it can be elimi- 
nated by dead code elimination. 

Sinking- Sinking Effects: The sinking of an assignment may open the way 
for other assignments to sink, if it is a use- or redefinition site for these 
assignments or if it modifies an operand of their right-hand side expression. 
Elimination- Sinking Effects: The elimination of dead assignments may en- 
able the sinking of other assignments. 

Elimination-Elimination Effects: The elimination of dead assignments may 
enable the elimination of other assignments. 

Sinking-elimination effects can be captured by a single application of PDE, 
but the other effects are obtained by repeated applications of PDE. This repe- 
tition costs a lot. The reasons why PDE cannot capture second order effects as 
first order effects are as follows: 















Partial Dead Code Elimination Using Extended Value Graph 



181 




(a) 



(b) 



Fig. 2. Illustrating missed optimizations. 



1. Sinking of a statement s = x := t is blocked by definition of x (si) or use of 
X (s2), or definition of the operand variables of t (s3). The blocked points 
for s are at the original program points for si, s2 and s3, though si, s2 and 
s3 may also sink. 

2. Elimination of all uses of x makes other elimination of definitions of x pos- 
sible; this effect cannot be captured by once application of PDE^ . 

In our approach, s is not blocked by si or s3 in the above item 1 because our input 
is assumed to be of static single assignment form. s2 in the above item 1 does not 
block sinking of s because sinking of use of variable x is propagated as sinking 
of blocking point from use to definition through EVG edge. Furthermore, the 
problem in the above item 1 is resolved by introducing concept of dead variable 
based on EVG. This dead variable is defined by the program points and variables 
which cannot be backward reached from statements forced to be alive through 
both GFG and EVG. As a result, our approach is achieved by only two steps : 
determination of dead variable and computation of possible sinking. 

This paper is organized as follows: In section 2, we give an internal represen- 
tation of programs to be analyzed for PDE. In section 3, we present definition 
of EVG which is our important work graph and how to build it. In section 4, 
we describe PDE based on EVG. In section 5, we discuss complexities of the 
algorithms and in section 6 present our experimental result. Finally, we discuss 
related works and conclude remarks. 

This is the first order effect for faint code elimination [11]. 



1 
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2 Program Representation 

We represent programs as CFG, CFG = (N,E,s,e) with node set N repre- 
senting basic blocks, edge set E representing the flow of control, and s and e 
representing the unique start node and end node of CFG with empty statement 
skip, respectively. In (nl,n2) G E, n2 = succi{nl) (or nl = predi{n2)) denotes 
that n2 (or nl) is i-th successor (or predecessor) of nl (or n2). 

A basic block contains sequences of statements, and elementary operations 
are close to what a single functional unit of a machine will do. The statements 
are classified into the following three groups: the assignment statements of the 
form X := Y op Z in which the right-hand side is a simple (single operator 
op) expression, the empty statement skip and the relevant statements forcing all 
their operands to be alive. 

For ease of presentation, the relevant statements are given by explicit output 
operations of the form out{t) here^. Furthermore, we suppose the program is 
represented by Static Single Assignment (SSA) form [3,6,15]. 





(a) (b) 

Fig. 3. Eliminating critical edges. 



PDF is blocked if there exist critical edges in the flow graph, i.e. the edges 
leading from a node with more than one successor to a node with more than 
one predecessor. In Fig.3(a), the assignment y := a + b nt node 1 is partially 
dead with respect to the assignment at node 3. However, this assignment cannot 
be safely eliminated by sinking it to its successors, because this sinking may 
introduce a new assignment on a path entering node 2 on the left branch. On 
the other hand, it can be safely eliminated by inserting a synthetic node bi ^2 in 
the critical edge (1,2), as illustrated in Fig. 3(b). 

We will restrict our attention to a program after every critical edge has 
been split by inserting a synthetic node ^ [10,16]. For example, we suppose the 
program shown in Fig. 2(a) has already been transformed as in Fig. 8(a). 

^ In practice, control expressions in if-statements and assignments to non-local vari- 
ables are considered relevant as well [11]. 

® In order to keep the presentation of the motivating example simple, we omit synthetic 
nodes that are not relevant. 
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3 Extended Value Graph 

We propose the data structure EVG which represents the global use-def relations 
before and after code motion preserving the same semantics. EVG is not only a 
work graph for the subsequent analysis but also used to ease the restrictions on 
PDE that assignment statements sunk to join point must be lexically identical. 




(a) (b) 



al := . . . 
xl := al + bO 



x2 := aO + bO 



x3 :=d^(xl,x2) 
a2 :=<^(al,aO) 



(c) 




(d) 





Fig. 4. Code sinking in normal form, SSA form, and graph representation 



In Fig. 4, sinking from (a) to (b) in normal form corresponds to one from (c) 
to (d) in SSA form. x3 is defined by 0- function in (c), but that is defined by 
normal expression in (d). Thus, in SSA form, sinking of assignment statement 
beyond the ^function, which has the left-hand side variable of the statement 
as one of arguments, causes its operands to be renamed. Moreover, (c) and (d) 
is represented as (e) and (f) respectively using the graph representation, which 
consists of nodes labeled by operator symbols, function names or constants, and 
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edges from use to definition^. In the graph representation, renaming operands 
corresponds to changing the graph structure. When statements sink beyond 0- 
function joining their left-hand side variables, the graph node corresponding to 
the left-hand side variable of the 0-function should be changed to a node labeled 
by an normal operator. 

These original node and changed node denoting the same value are repre- 
sented by a single EVG node as shown in Fig. 5(a). 



3.1 Definition of EVG 

An EVG node consists of two sub-nodes, and an EVG edge is the directed edge 
from a sub-node to an EVG node. All sub-nodes have an operator symbol, a 
function name, or a constant as a label. 

EVG is precisely defined as follows, where C is constant set. Op is operator 
set, is 0-function set {0i | i is the CFG node where 4>i exists}, and _L is a 
special constant denoting unknown value. 

Definition 31 An EVG is a tuple: 

EVG = {SV, V, A), where 

— {SV,V,A) is a directed graph with sub-node set SV, 

node set V — de/ sv,^^\sVop € SVop A sv,p G A SV — SVop U 

and edge set A C SV x V . 

-op : V ^ SVop 

is a function to extract sVop from {sVop, sv,p) € V. 

— phi : V SV(j, 

is a function to extract from {sVop, sVc/)) G V . 

-lb : SV ^ OPUCU^U}!.} 

is a function to get a label from a sub-node, where Ib(sVop) G OP U C U {_L} 
and lb{sv,p) G {_L} {sVop G SVop o.'nd € S'V0). 

— loc : SV ^ N 

is a function to get the original location of the statement corresponding to 
the argument sub-node. 

— childi : SV V 

is a function to get destination v of i-th edge (sv,v) G A outgoing from the 
argument sv. 

Furthermore, following sets are used subsequently: 

— Childop{v) =def {v' I v' = childi{op{v)) Ai = 1, 2, . . .} and 
Child,p{v) =def {v' I v' = childi{phi{v)) Ai = 1,2,...} 

^ This graph representation is called value graph (VG) [2]. In VG, for all trivial as- 
signments (in a form A := B), references to A in the succeeding expressions are 
substituted with the reference to B. In our approach, however, such substitution is 
applied only when the assigned variable is a temporary, because our attention is 
given to each assigned variable in source program. 
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— Parentop{v) =def W \ op{v') = childj^ ^(w) Ai = 1,2, . . .} and 

Parent(j,{v) =def | phi{v') = child~^{v) Ai = 1,2, . . .} 

Each EVG node (svop, sv^) corresponds to a variable vavi in SSA form and 
represents use-def structure which starts at use of vari. The use-def structure 
is represented by a sub-graph whose root is sv^ before assignments to var are 
sunk to loc(sv^), and represented by a sub-graph whose root is sVop after they 
have been sunk to loc(sv^). 

An EVG can be built by two steps: initial step and transformation step. The 
EVG constructed from SSA form at the initial step is especially called an initial 
EVG. 



3.2 Building of EVG 

Each edge of the initial EVG corresponds to a connection between use of a 
variable and definition for the use. For a trivial assignment whose left-hand side 
variable is a temporary (one of the form T := B), the definition for T is treated 
as that for B. Each node of the initial EVG corresponds to an individual operator 
or function in the program, where each node has at least one sub-node labeled T. 
This initial EVG is built from VG directly : changing each VG node to an EVG 
sub-node, generating an initial EVG nodes by coupling each sub-node with a 
new sub-node labeled T, and changing destination of each edge from a sub-node 
to a node including it. Fig. 5(b) shows an initial EVG for Fig. 2(a). 






Fig. 5. EVG for Fig.3, initial EVG and complete EVG for Fig.2(a). 
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At the next step, an initial EVG is transformed to represent new computa- 
tional structures which reflect results of code sinking. As shown in Fig. 5(a), the 
code sinking transformation is applied to an EVG node with sub-node labeled 
with (j), called <()-node. That means statements corresponding to children of a 
(^node sv are merged into one statement at join point loc(sv). Then, the right- 
hand sides of merged statements require to be lexically identical expressions, 
which consists of the same operator and operands. On the other hand, in SSA 
form, the conditions for merged operands is different from that in normal form 
as follows: 

1. operands are the same variable or constant, or 

2. operands are arguments of the same (/)- function. 

The above item 2 is derived from that a <))-function joins definitions for a variable 
in normal form. 

Since an initial EVG is build from a program in SSA form directly, its code 
sinking transformation is derived from that in SSA form. 

As a candidate for the transformation, an EVG node {svop, su^) should satisfy 
the following conditions: 

1. lb{sVop) = T A Ib(sv^) yf T, 

2. For all i, every Ib(svi) represents the same label (1), where svi = op{childi{sv^)) 
and 

3. For all i, every childj(svi) represents the same node (vj), or the node (vj) 
with a (/)-node such that childj(svi) G Child^{vj) A lb{phi{vj)) = lb{svc/)) 
exists, where j = 1,2.. 

The code sinking transformation is simply described using above the notation 
as follows: 

1. assigning I to lb{sVop), and 

2. adding each edge (sVop,Vj) to A. 

Fig. 6 shows all patterns of the EVG transformation. When an EVG node 
matches with a root node of one left-hand side of these four patterns, it becomes 
a candidate replaced by the corresponding right-hand side. We can get the final 
result for all possible sinking by applying those transformations until there is no 
candidate. 

In Fig.8(a), sinking of statements assigning to xO and xl are blocked at the 
predecessors of node 3 and 4 respectively, because their second operands tO and 
bO are neither the same variable nor arguments of the same (/(-function, which 
means they are not the same variable in normal form. Therefore, EVG nodes 
x2 and x3 joining xO and xl do not become candidates for the transformation 
as shown in Fig. 5(b). However, if (/(-functions with arguments tO and bO are 
inserted into node 3 and 4, the two statements can be sunk further. In EVG, 
this corresponds to insertion of synthesized nodes^ with the (/(-node joining tO 

® We suppose that synthesized node cannot be a candidate for the code sinking trans- 
formation to avoid insertion of many copy assignments. 
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Fig. 6. Transformation patterns. 



and 50. In order to make a node v a candidate for the transformation, the inserted 
nodes nv must satisfy the following conditions: 

lb{op{nv)) = _L A lb{phi{nv)) = lb{phi{v)) 

In Fig. 5(c), dotted rectangles t50 and tbl represent the synthesized nodes. 
Dotted arrows represent edges added to the initial EVG. 

4 PDE based on EVG 

In this section, we show our PDE to solve the second order effect problem con- 
fronted in the previous works. It is achieved by the dataflow analysis for CFG 
and EVG, which allows such effects to be captured as the first order. 

We suppose that ^functions are pinned at the original program points and 
have no side effect. That means (((-functions only work to propagate some dataflow 
informations, and allows us to treat different variables joined by (((-function as 
a variable. In our dataflow analysis, each variable is assigned to a separate slot 
on bit-vector. To treat different slots representing a variable as a single slot, 
we define neighbor slot. Then, we describe our PDE on the slot and EVG, and 
present the dataflow equations. Finally, we illustrate its dataflow analysis using 
an example. 

4.1 Neighbor slot 

Since each slot is assigned to a variable in SSA form, we identify each slot by a 
product of EVG node corresponding to each variable and CFG node, where slot 
set SL is SL = VxN. We represent neighbor slot for a slot si = {v, n) G SL by 
providing t-th predecessor predsLi(sl) and t-th successor succsLi(sl) as follows: 
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- predsLzisl) ■■ 

— if loc{phi{v)) = n, then {childi{phi{v)),predi{n)). 

— if 3v' ^ V .loc{phi{v')) = n Av G Child^{v'), then nothing. 

— otherwise, (v,predi{n)). 

- succsLi(sl) : 

— if 3v' ^ v.v' G Parent^{v) A loc{phi{v')) = succi{n) then (v' , succi{n)). 

— if succi{n) = loc{phi{v)), then nothing. 

— otherwise, {v , succi{n)) . 

We also use the following sets: 

- predsL(sl) =def {sV \ si' = predsLi(sl) M = 1,2..} and 

- succsl(sI) =def {si' I si' = SUCCSLi(sl) M = 1,2..}. 

Furthermore, we define a set of start slots and a set of end slots as follows: 

- SLs =def {si I predsh{sl) = 0}, and 

- SL^ =def {si I succsl(sI) = 0}. 

In Fig. 8(b), each slot is illustrated by solid rectangle, attaching a variable 
name to each slot to show corresponding EVG node. If si is a neighbor slot of a 
slot si' with different variable name from si, they have been linked by solid edge. 
Otherwise, si at CFG node n is a neighbor slot of si' with the same variable 
name as si at the successor or predecessor of n, where dotted rectangle and arrow 
represent CFG node and CFG edge, respectively. 

4.2 PDE 

An occurrence of assignment form a; := t at a CFG node n is dead, if its left-hand 
side variable x is dead. The dead variables can be computed by a backwards 
directed dataflow analysis. Moreover, in order to characterize all assignments 
that are of no use for any relevant computation, the concept of faint code is 
used [11]. An occurrence of assignment form s = x := t at a CFG node n is 
faint, if its left-hand side variable x is either faint or dead. Faint variables can be 
computed by a slot- wise dataflow analysis [9,7], where the assignment statement 
s is influenced by not only successors of a;-slot but also statements using x. 

Our concept of dead code is similar to the faint code, except for program point 
to which influence of the statement using x is propagated. During computing 
faint variable, the influence is propagated from program points using a variable 
to program points defining it. These points are given by the original program 
points of each statement. On the other hand, since we expect capturing all second 
order effects from sinking, we treat an assignment statement s = a; := t as if 
the use of its left-hand side x (s') has been sunk to the nearest point to relevant 
statements using value computed by s'. Namely, the influence is propagated 
backward from slot rsl for relevant statement, and the influence to a slot (v,n) 
is also propagated to each slot (v',n) with a child v' of v through EVG edge 
simultaneously. 




Partial Dead Code Elimination Using Extended Value Graph 



189 



In Fig. 7(a), DEADi^^ n) means a variable corresponding to an EVG node v 
is dead at the entry of CFG node n, where influence of DEAD is propagated by 
USED at the nearest point to relevant statements. The propagation by USED 
is continued whenever the influence of DEAD is propagated from a slot (u,n) 
to a slot (w', n') such that v' ^ v as illustrated by propagation from j/0 to x3 at 
CFG node 46 in Fig. 8(b). That guarantees that the influence is propagated to 
all sub-graph of EVG to be influenced. 

After deciding dead variables, we compute possible sinking. Sinking of a 
statement is possible while its left-hand side variable is not dead and its sinking 
does not increase the number of statements on any path. 



def 

if lb{op{v)) is relevant 
A loc{op{v)) = n 

-^DEAD{v' , n) V USED(^^i^n) otherwise 

I (v' ,n')^succQ]^(v,n) v'^Pa entop{v) 

^ A v'^v 

DEAD(^^^n) =def ~'USED(^^^n) DEAD(^i n') 

(^v' ,n'')^succgj^ {v,n) 



j true 



Fig. 7. Dataflow equation 



As soon as such sinking becomes impossible, that influence is propagated 
to its definition. This enable all statements to be simultaneously sunk without 
being blocked by the original program point of use of each statement. 

In Fig. 7(b), S'C/AAT(„ „) means that assignment statement which will define 
a variable corresponding to v can be sunk to CFG node n. It is noted that the 
slot with the EVG node v such that lb{op{v)) = T is initialized to false at 
loc{phi{v)). BLOCKED propagates the impossible sinking information from a 
slot {v,n) to slots with children of v to hold def-use orders. 

DEAD(^y n) and SUNK(^^^n) can be computed by an iterative worklist algo- 
rithm operating slot-wise on bit- vectors as well as computation of faint variables 
[11]. Fig. 8(b) shows the greatest solution of these equation system®. This solu- 
tion gives the program points where statements must be inserted, by means of 
IN SERTf^y^n) in Fig. 7(c). In Fig. 8(b), INSERTS of out at 6, yO at 46 and x3 
at 46 are true. 

Actual program transformation is simple as described bellow. 

1. At each CFG node n, visit EVG node v such that INSERTf^y^n) = true in 
postorder, and 

We show only a part of entire result. 



6 
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Fig. 8. An input program and solution of dataflow analysis for Fig. 2(a) 



2. At the visited EVG node v, insert a statement 

“ V ar{v) := Var{childi{op{v))) lb{op{v)) Var{child 2 {op{v)))” , where V ar{v) 
is variable name corresponding to EVG node v. 

(((-functions lb{phi{v)) such that <S'C/A^Ar(„ = false also needs to be 
inserted at entry of loc{phi{y)) . Finally, these (((-functions of a form A := 4>{B^ C) 
is replaced by assignment A := B at exit of one predecessor and by A := C at 
an exit of another in order to translate SSA form into normal form. 

5 Complexity 

We use the following parameters to compare the complexity of our algorithm to 
those of [11]. 

— I: the number of instructions, 

— W: the maximal factor by which the number of instructions may increase 
during the application of the original algorithm, 

— V : the number of variables, and 

— N : the number of basic blocks 
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To estimate the complexity of our overall algorithm, we must consider the 
complexities required by the construction of EVG and dataflow analysis. In the 
construction of EVG, the code sinking transformation is applied at an EVG 
node with a left sub-node labeled _L and a right sub-node labeled ^function. 
Since the number of ^-functions is less than V • N, the construction of EVG 
requires 0(V • N) in the worst case. In the dataflow analysis, the number of slots 
is given by N • I, and each predicate can be updated once by executing only 
two elementary operations per GFG edge. Thus, the dataflow analysis requires 
0(N • I) in the worst case. Therefore, overall complexity of our algorithm is 
turned to be 0(N • I). If it is sketched more roughly in terms of an uniform 
parameter n, it is of O(n^). In comparison with this, PDE given by [II] is of 
0(W • V • I^) in the worst case. It is of O(n^) in terms of an uniform parameter, 
where it is O(n^) under reasonable assumption. 

6 Experimental result 

To demonstrate the effectiveness of our approach, we have implemented PDE 
based on EVG as optimization phase of an experimental G compiler. The front- 
end of compiler translates a source program to Medium-Level Intermediate Rep- 
resentation (MIR) [12]. The optimization phase consumes and produces MIR 
code. The code generation phase consumes the MIR and produces G code, which 
is then compiled by gcc and linked as a native program. The executed code con- 
tains annotations that collect execution cycle counts for the MIR. The perfor- 
mance test contains execution of 23 routines, drawn from two benchmarks and 
four famous algorithms as shown in Table. 1. The first two columns show the 
application and function name, the next three columns show the execution cycle 
counts for unoptimized code, for the previous PDE proposed by [11] and for our 
approach, respectively. 

The last two columns show the ratio of execution time on our approach to 
the previous PDE and on our approach to unoptimized code, respectively. These 
experimental results show the best ratio was 0.938 to the previous PDE and 
0.825 to unoptimized code. There was no minus gain. 



7 Related work 

The importance of PDE was pointed by [8] and [11]. Algorithm of [8] is charac- 
terized by introducing more complex statements as movement candidates when- 
ever elementary statements are blocked. Thus, it may modify the branching 
structure of the program under consideration. That approach cannot deal with 
loop-invariant code motion, and has the restriction that complex partially dead 
statement is only placed at a single later point where it is live. With this restric- 
tion, there remains the possibility of missing some second order movements. 

On the other hand, algorithm of [11] can deal with loop-invariant code motion 
and all second order effects without modifying branching structure. 
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Table 1. Experimental Results. 



programs 


functions 


non-PDE 


PDE 


new 


new /PDE 


new/non-PDE 


Unpack 


main 


7190 


6661 


6256 


0.939 


0.870 


print_time 


368 


368 


360 


0.978 


0.978 


epslon 


23 


20 


19 


0.950 


0.826 


dmxpy 


147774 


146361 


146361 


1.000 


0.990 


dgesl 


213252 


192504 


187330 


0.973 


0.878 


dgefa 


6685068 


6286202 


6154928 


0.979 


0.920 


daxpy 


184404012 


181715066 


181715066 


1.000 


0.985 


dscal 


1791452 


1751698 


1751698 


1.000 


0.977 


idamax 


2071602 


1932138 


1803438 


0.933 


0.870 


matgen 


12219498 


10853028 


10853028 


1.000 


0.888 


whetstone 


main 


1419378 


1371296 


1341777 


0.978 


0.945 


po 


190960 


190960 


190960 


1.000 


1.000 


p3 


206770 


188790 


188790 


1.000 


0.913 


n-queen 


main 


280 


233 


231 


0.991 


0.825 


backtrack 


483671 


450759 


447875 


0.993 


0.925 


quicksort 


main 


90403 


84147 


84146 


0.999 


0.930 


hilbert 


main 


219 


202 


191 


0.945 


0.872 


c 


2076 


2028 


1916 


0.944 


0.922 


B 


3172 


3091 


2902 


0.938 


0.914 


A 


4612 


4486 


4192 


0.934 


0.908 


D 


3172 


3091 


2902 


0.938 


0.914 


shortest path 


main 


2035 


1942 


1939 


0.998 


0.952 


init 


896 


812 


812 


1.000 


0.906 



These approaches depend on right-hand side form, and may lose attractive 
optimizations our approach can deal with. Although our approach is classified 
into an extension to algorithm of [11], it is more efficient than that algorithm. In 
contrast to complexity O(n^) of the latter approach under reasonable assump- 
tion, the complexity of our approach is of O(n^). 

There are some simplified versions of PDE [4,5], but their effects are re- 
stricted, i.e., assignments cannot be sunk beyond (ji-function. 

Recent work by [14] proposes aggressive approach using slicing transforma- 
tions. That approach however takes exponential time on arbitrary input pro- 
grams. Another recent work by [13] proposes an approach to move dead as- 
signments along frequently executed paths at the expense of adding additional 
instructions along infrequently executed paths. This approach requires path pro- 
filing information. These recent works also depend on the right-hand side form. 

8 Conclusion 

We proposed a new effective and efficient algorithm for PDE. Our approach 
not only has no restriction of the right-hand side form of statements, but also 
efficiently capture all second order effects by changing them into the first order. 
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We showed the effectiveness of our approach, using an experimental compiler 

based on the algorithm proposed here. 
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Abstract. Programs represented in Static Single Assignment (SSA) form 
contain phi instructions (or functions) whose operational semantics are to merge 
values coming from distinct control flow paths. However, translating phi 
instructions into native instructions is nontrivial when transformations such as 
copy propagation and code motion have been performed. In this paper we 
present a new framework for translating out of SSA form. By appropriately 
placing copy instructions, we ensure that none of the resources in a phi 
congruence class interfere. Within our framework, we propose three methods 
for copy placement. The first method pessimistically places copies for all 
operands of phi instructions. The second method uses an interference graph to 
guide copy placement. The third method uses both data flow liveness sets and 
an interference graph to guide copy placement. We also present a new SSA- 
based coalescing method that can selectively remove redundant copy 
instructions with interfering operands. Our experimental results indicate that the 
third method results in 35% fewer copy instructions than the second method. 
Compared to the first method, the third method, on average, inserts 89.9% 
fewer copies during copy placement and runs 1 5% faster, which are significant 
reductions in compilation time and space. 



1. Introduction 

Static Single Assignment (SSA) form is an intermediate representation that facilitates 
the implementation of powerful program optimizations [7, 12, 13], where each 
program name is defined exactly once and phi ( ) instructions (or nodes) are inserted 
at confluent points to merge multiple values into a single name. Phi instructions are 
not directly supported on current architectures, and hence they must be eliminated 
prior to final code generation [7]. However, translating out of SSA form is nontrivial 
when certain transformations, such as copy folding and code motion, have been 
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performed. Most of the previous work on SSA form have concentrated either on 
efficiently constructing the representation [7, 11], or on proposing new SSA-based 
optimization algorithms [4, 5, 12]. 

We are aware of the following published articles related to translating out of SSA 
form. Cytron et al. [7] proposed a simple algorithm for removing a k-input phi 
instruction by placing ordinary copy instructions (or assignments) at the end of every 
control flow predecessor of the basic block containing the phi instruction. Cytron et 
al. then used Chaitin’s coalescing algorithm to reduce the number of copy instructions 
[3]. The work in [2] showed that Cytron et al.’s algorithm cannot be used to correctly 
eliminate phi instructions from an SSA representation that has undergone 
transformations such as copy folding and value numbering. To address this, Briggs et 
al. [2] proposed an alternative solution for correctly eliminating phi instructions. 
Briggs et al. exploit the structural properties of both the control flow graph and the 
SSA graph of a program to detect particular patterns, and use liveness information to 
guide copy insertions for eliminating phi instructions. Any redundant copies 
introduced during phi instruction elimination are then eliminated using Chaitin’s 
coalescing algorithm [3]. Pineo and Soffa [10] used interference graph and graph 
coloring to translate programs out of SSA form for the purpose of symbolic 
debugging of parallelized code. Leung and George [8] constructed SSA form for 
programs represented as native machine instructions, including the use of machine 
dedicated registers. Upon translating out of SSA form, a large number of copy 
instructions, including many redundant ones, may be inserted to preserve program 
semantics, and they rely on a coalescing phase in register allocation to remove the 
redundant copy instructions. 

In this paper we present a new framework for leaving SSA form and for 
eliminating redundant copies. We introduce the notion of a phi congruence class to 
facilitate the removal of phi instructions. Intuitively, a phi congruence class contains a 
set of resources (or variables) that will be given the same name when we translate out 
of SSA form. The key intuition behind our method for eliminating phi instructions is 
to ensure that none of the resources within a phi congruence class interfere among 
each other. The idea is very similar to coloring-based register allocation problem [3], 
where if two live ranges interfere they should be given two different physical 
registers. But if there is only one unused physical register available, then one of the 
live ranges should be spilled to eliminate the interference. To break the interferences 
among resources in a phi instruction we introduce “spill code” by placing copy 
instructions. Another unique aspect of our method is that we don’t use any structural 
properties of either the control flow graph or the SSA graph to guide us in the 
placement of copy instructions. 

We present three different methods of varying sophistication for placing copies. 
Our first method is closely related to the copy placement algorithm described in [7] 
except that it correctly eliminates copies even when transformations, such as copy 
folding and code motion, have been performed on the SSA form of a program. This 
method does not explicitly use either liveness or interference information to guide 
copy insertion and placement, and therefore places many more copies than necessary. 
To reduce the number of copies that are needed to correctly eliminate phi instruction, 
our second method uses an interference graph to guide copy placement. Although it 
places fewer copies than the first method, it still places more copies than necessary. 
To further reduce the number of copies, our third method uses both liveness and 
interference information to correctly eliminate phi instructions. A unique aspect of 
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this method is that any copies that it places cannot be eliminated by the standard 
interference graph based coalescing algorithm. 

In this paper we also present a new SSA-based coalescing algorithm that can 
eliminate redundant copies even when the source resource (or variable) and the 
destination resource interfere with each other when certain constraints are satisfied. 
This algorithm also uses phi congruence classes to eliminate redundant copies. 

We implemented all three methods for copy placement and also implemented our 
SSA based coalescing algorithm. Our experimental results indicate that our third 
method is most effective in terms of the number of copies in the final code, and it is 
also faster than the first method. Note that reducing the compilation time and space 
usage is a key motivation to develop a new method to translate out of SSA form. On 
average we found that our third method inserts 89.9% fewer copies compared to the 
first method. This also means that our third method is an improvement over the 
algorithms proposed by Cytron et al. [7] and Briggs et al. [2] (since they also 
pessimistically insert copies to eliminate phi instructions). 

The rest of the paper is organized as follows. In the next section we will motivate 
the problem of translating out of SSA form, and introduce the notion of phi 
congruence class. Section 3 briefly discusses liveness analysis for programs under 
SSA form. Section 4 presents three different methods for placing copies. Section 5 
presents a new SSA-based coalescing algorithm for eliminating redundant copies. 
Section 6 presents experimental results. Finally, Section 7 discusses related work and 
presents our conclusion. 



2. Motivation and Phi Congruence Class 

When the SSA form of a program is constructed using the algorithm of Cytron et al. 
[7], the representation has the following two properties: 

• Resources (or variables) are appropriately renamed so that each resource has a 
single definition. 

• Phi instructions are introduced to merge multiple definitions coming from distinct 
control flow paths. Each phi instruction has one input operand for each control 
flow predecessor. 

Figure 1 shows a program in SSA form. Each source operand in a phi instruction is a 
pair, x:L, where x is a resource name and L represents the control flow predecessor 
basic block label through which the value of x reaches the phi instruction [6]. 

Given a resource x let phiConnectedResource(x) = {y | x and y are referenced (i.e. 
used or defined) in the same phi instruction, or there exists a resource z such that y 
and z are referenced in the same phi instruction and x and z are referenced in the same 
phi instruction}. We define phi congruence class as phiCongruenceClass[x] to be the 
reflexive and transitive closure of phiConnectedResource(x). Intuitively, the phi 
congruence class of a resource represents a set of resources “connected” via phi 
instructions. The SSA form that is constructed using Cytron et al.’s algorithm (which 
we call the Conventional SSA (CSSA) form) has the following important property. 

• Phi Congruence Property. The occurrences of all resources which belong to the 
same phi congruence class in a program can be replaced by a representative 
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resource. After the replacement, the phi instruction can be eliminated without 
violating the semantics of the original program’. 

For the example shown in Figure 1, resources xl, x2, and x3 that are referenced in the 
phi instruction belong to the same phi congruence class. Let x be the representative 
resource of the congruence class. We can replace each reference to xl, x2 and x3 by 
X. Once we perform this transformation, we can eliminate the phi instruction. The 
resulting program is shown in Figure 2. 





Figure 1 . An e xample program in SS A 
form. 



Figure 2. An e xample of translating 
out of CSSA form. 



Referring back to the example shown in Figure 1, let us coalesce the copy x2=y. 
The resulting SSA form is shown in Figure 3. Now if we use the phi congruence 
property to replace the references of xl, x3, and y with a representative resource the 
resulting program will not preserve the semantics of the original program. This is 
because, after folding the copy x2=y, xl and y have overlapping live ranges, and 
hence should not be replaced by the same name. Therefore any interfering resource in 
a phi congruence class should be “spilled” by inserting copies. In other words we 
should ensure that none of the resources in a phi congruence interfere with each other. 
The idea is very similar to coloring-based register allocation problem [3], where if 
two live ranges interfere they should be assigned different physical registers. But if 
there is only one unused physical register available, then one of the live ranges needs 
to be spilled to eliminate the interference. 




Figure 3. An e xample of TSSA form. 



5 



Phi congmence is analogous to the notion of register “webs” (outside of SSA form) defined 
in [9]. 
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Given a CSSA form, optimizations such as copy folding and code motion, may 
transform the SSA form to a state in which there are phi resource interferences. Let us 
call such an SSA form TSSA (for transformed SSA) form. Our algorithm for 
translating out of the SSA form consists of three steps: 

• Step 1 : Translating the TSSA form to a CSSA form; 

• Step 2: Eliminating redundant copies; and 

• Step 3: Eliminating phi instructions and leaving the CSSA form. 

Our approach‘d to eliminating phi resource interferences in Step 1 relies on liveness 
analysis and interference detection, which is discussed in the next section. Step 2 is 
our new CSSA-based coalescing algorithm (see Section 5). Step 3 is a straightforward 
application of the phi congruence property and elimination of phi instructions (the 
details not presented). 



3. Liveness and Interference 

A variable v is live at a program point p if there exists a path from to a use of v that 
contains no definition of v [1]. A traditional bit vector method can be used to analyze 
programs in SSA form for liveness by treating phi instructions specially. Cytron and 
Gershbein [6] made an important observation regarding phi instructions. Given a phi 
instruction xO = phi(xl:Llx2:L2,x3:L3,..., xn:Ln) that textually appears in a basic 
block LO, each use of Xj, where 1 <= i <= n, is associated with the end of the 
corresponding predecessor basic block through which k reaches LO. In this paper we 
associate the definition of a phi instruction to be at the beginning of the basic block 
where the phi instruction textually appears, i.e., basic block LO, and hence xO is 
treated live upon entering LO. 

Given the above special treatment for phi instructions, we can use the traditional 
bit vector technique for liveness analysis, and construct the interference graph. Two 
variables in a program are said to interfere if their live ranges overlap at any program 
point. We will use the following notation to describe the liveness properties at the 
beginning and at the end of each basic block, respectively. 

1. LiveIn[L]: The set of resources that are live at the beginning of basic block L. 

1. LiveOut|L]: The set of resources that are live at the end of basic block L. 



4. Translating TSSA Form to CSSA Form 

The process of translating TSSA form to CSSA form ensures that none of the 
resources referenced within a phi instruction interfere with each other. Here we 
present three methods for translating a TSSA form to a CSSA form. 



Although there are known SSA-based transformations that preserve CSSA form, e.g. [4], they 
often require the computations of additional data flow information, e.g availability and 
anticipatability. Furthermore, insisting on CSSA form may constrain some optimization 
opportunities. 
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4.1 Method I: Naive Translation 



In this method we naively insert copies for all resources referenced in a phi 
instruction; one copy is inserted for each source resource in the corresponding basic 
block feeding the value to the phi instruction, and one copy is inserted in the same 
basic block as the phi instruction for the target resource. Figure 4 illustrates this naive 
translation. 

This method is very simple to implement but introduces many redundant copies. 
The redundant copies can be eliminated using our CSSA-based coalescing algorithm 
(see Section 5). It is important to note that when a resource of a phi instruction is 
spilled we will always ensure that the new resource is referenced only in the phi 
instruction. This spilling technique will also be used even for Method II and Method 
III translation. 





(a) 

Figure 4. An e xample of translating TSS A form (a) to CSSA form (b) using Method I. 




Figure 5. T ranslation to CSSA form using Method 11. 



4.2 Method II: Translation Based on Interference Graph Update 

In this method we will insert copies only if resources of phi instructions interfere. 
Consider the example shown in Figure 4(a). Here we can see that xl and x2 interfere 
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with each other. To eliminate the interference we insert two copies, one for xl and 
one for x2. (Since x3 does not interfere with either xl or x2 no new copy is inserted 
for x3.) Next we incrementally update the interference graph and the phi congruence 
classes. Applying this algorithm to the example in Figure 4(a) results in the CSSA 
form shown in Figure 5. Notice that we have inserted two copies. But it is evident 
from the figure that only the copy x2’=x2 is needed to ensure correctness. Since the 
interference graph does not carry control flow information, it is difficult to determine 
when copy instructions are redundant using the interference graph alone. Also, 
without the liveness information, the interference graph update will be conservative. 
In the next section we will show how to further reduce the number of copies that are 
needed to correctly eliminate phi instructions by using both interference and liveness 
information for guiding copy insertions. 

It is important to observe that while deciding whether copy instructions are needed 
for xl and x2, in addition to checking the interference between xl and x2, we must 
also check for interferences with all other resources that will be replaced with the 
same names as xl and x2 after translating out of SSA form (i.e., all members of their 
phi congruence class). We will elaborate on this in the next section. 



4.3 Method III: Translation Based on Data Flow and Interference Graph 
Updates 

Since an interference graph does not carry control flow information we may insert 
more copies than necessary in Method II to eliminate the phi resource interferences. 
In this section we will use liveness information in addition to the interference graph to 
further reduce the number of copies that are necessary to eliminate the phi resource 
interferences. 

To motivate the new algorithm, consider Figure 4(a). Here xl and x2 interfere, 
LiveOut[Ll] = {xl, x2}, and LiveOut[L2] = {x2}. Notice that x2 is in LiveOut[Ll], 
and we claim that inserting a new copy xl’=xl only in LI will not eliminate the phi 
interference (i.e., we still need to insert a copy in L2 to eliminate the interference). 
This is because the target of the new copy will interfere with x2. Now since xl is not 
in LiveOut[L2] we can eliminate the phi interference by inserting a new copy x2’=x2 
only in L2 (i.e., a copy is not needed in LI). Notice that we used LiveOut sets to 
eliminate interference among phi source resources. To eliminate interferences 
between the target resource and a source resource in a phi instruction we use Livein 
and LiveOut sets (see the “lost-copy” problem and the “swap” problem discussed 
later in the section). The complete algorithm for eliminating phi resource 
interferences based on data flow and interference graph updates is given in the 
appendix. A unique aspect of this algorithm is that any copy it places cannot be 
eliminated by the standard interference graph based coalescing algorithm (assuming 
there are no dead phi instructions). In other words, the source and target resources of 
the copies inserted by this method will interfere when we leave the SSA form. Also, 
the algorithm precisely updates both the interference graph and liveness information. 
Note that this algorithm does not ensure that the number of copies inserted to 
eliminate phi resource interference is minimum. (We believe that the problem of 
ensuring a minimum number of copies inserted to correctly eliminate phi instructions 
is still to be formulated, e.g. based on a static count or a dynamic count, and remains 
unresolved yet.) 
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The first step in the algorithm is to initialize the phi congruence classes such that 
each resource in the phi instruction belongs to its own congruence class. These classes 
will be merged after eliminating interferences among them. The crux of the algorithm 
is to first check whether for any pair of resources, xi:Li and xj:Lj in a phi instruction, 
where 0 <= i, j <= n, n is the number of phi resources operands, and xi!= xj, there 
exists resource yi in phiCongruenceClass[xi], yj in phiCongruenceClass[xj] and yi 
and yj interfere. If so we will insert copies to ensure that xi and xj will not be put in 
the same phi congruence class. Consider the case in which both xi and xj are source 
resources in the phi instruction.’ There are four cases to consider to insert copies 
instructions for resources in the phi instruction. 















Case 1. The intersection of phiCongruenceClass[xi] and LiveOut[Lj] is not 
empty, and the intersection of phiCongruenceClass[xj] and LiveOut[Li] is empty. 
A new copy, xi’=xi, is needed in Li to ensure that xi and xj are put in different 
phi congruence classes. So xi is added to candidateResourceSet. 

Case 2. The intersection of phiCongruenceClass[xi] and LiveOut[Lj] is empty, 
and the intersection of phiCongruenceClass[xj] and LiveOut[Li] is not empty. A 
new copy, xj’=xj, is needed in Lj to ensure that xi and xj are put in different phi 
congruence classes. So xj is added to candidateResourceSet. 

Case 3. The intersection of phiCongruenceClass[xi] and LiveOut[Lj] is not 
empty, and the intersection of phiCongruenceClass[xj] and LiveOut[Li] is not 
empty. Two new copies, xi’=xi in Li and xj’=xj in Lj, are needed to ensure that xi 
and xj are put in different phi congruence classes. So xi and xj are added to 
candidateResourceSet. 

Case 4. The intersection of phiCongruenceClass[xi] and LiveOut[Lj] is empty, 
and the intersection of phiCongruenceClass[xj] and LiveOut[Li] is empty. Either 
a copy, xi’=xi in Li, or a copy, xj’=xj in Lj, is sufficient to eliminate the 
interference between xi and xj. However, the final decision of which copy to 
insert is deferred until all pairs of interfering resources in the phi instruction are 
processed. 



LiveOutI L 1 1 = ( x I } 
LiveOutiL2] = jx2} 
LiveOutI L3] = j x I .x3 ) 




Figure 6. An e xample of TSSA form with Li veOut sets. 



7 



The situation is exactly the same when one of xi or xj is a target resource except 
that we use both Livein and LiveOut sets to decide copy placement. 
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By deferring copy placement in Case we avoid placing redundant copies. To see this 
consider the TSSA program shown in Figure 6. Initially phiCongruenceClass[xl] = 
{xl}, phiCongruenceClass[x2] = {x2}, and phiCongruenceClass[x3] = {x3}. The 
LiveOut sets for LI, L2, and L3 are shown in the figure. Since the live ranges of xl 
and x2 interfere new copies are needed to break the phi resource interference. We can 
see that xl is not in LiveOut[L2] and x2 is not in LiveOut[Ll]. Therefore, we can 
eliminate the phi resource interference between them by inserting a copy either in LI 
or in L2. Now rather than deferring the copy insertion let us insert a new copy x2’=x2 
in L2. Since xl and x3 also interfere with each other, copies are needed to eliminate 
the interference on this pair of resources. Here we can see that xl appears in 
LiveOut[L3], so we must insert a new copy xl’=xl in LI to eliminate the phi resource 
interference between them. By inserting this copy in LI we can immediately see that 
the copy x2’=x2 inserted earlier is redundant. Therefore, to avoid inserting redundant 
copies we defer copy insertion for Case 4 and keep track of resources for which we 
have not resolved copy insertion in a map called the unresolvedNeighborMap. Each 
time the copy insertion is deferred (unresolved) for a pair of resources xi and xj, we 
add xi to the set unresolvedNeighborMap[xj] and xj to the set unresolved- 
NeighborMap [xi]. Once all the resources in the phi instruction are processed, we 
handle the unresolved resources. We pick resources from the map in a decreasing size 
of unresolved resource set. For each resource x that is picked up from the map, we 
add X to candidateResourceSet if x contains at least one unresolved neighbor. We also 
mark x to be resolved and add x to candidateResourceSet. Finally, when all the maps 
are processed, it is possible a resource x that was marked as resolved may now 
contain all its neighbors to be marked as resolved. If this is the case we remove x 
from candidateResourceSet. 

Once all resources that need copies are put in the candidateResourceSet, we insert 
copies for these resources as in Method 1 or Method II. We also update the liveness 
information, the interference graph, and the phi congruence class. The details of these 
updates are given in the appendix. Next we illustrate the application of our algorithm 
to two problems, the ‘lost-copy’ problem and the ‘swap’ problem, discussed in [2]. 

The Lost-Copy Problem. Figure 7 illustrates the lost copy problem. Figure 7(a) 
shows the original code. Figure 7(b) shows the TSSA form (with copies folded). If we 
use the algorithm in [7] to eliminate the phi instruction, the copy y=x would be lost in 
the translation. 




(a) 



(b) 

Figure 7. An e xample of the lost cop y problem. 



(c) 
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Let us apply our algorithm to this problem. From Figure 7(b) we can see that x2 
and x3 interfere, x2 is in LiveOut[L2], and x3 is not in LiveIn[L2], So a new copy 
x2=x2’ is inserted in L2 for x2. Once we do this as shown in Figure 7(c) we can 
simply eliminate the phi instruction after replacing references to all its resources by a 
representative resource. 

The Swap Problem. Let us apply the algorithm to the swap problem in [2] shown in 
Figure 8. The original code for the swap problem is given in Figure 8(a), and the 
corresponding CSSA form is given in Figure 8(b). After we perform copy folding on 
the CSSA form, we get the TSSA form shown in Figure 8(c). 





(a) 



(b) 

Figure 8. An e xample of the sw ap problem. 



(c) 



Consider the TSSA form shown in Figure 8(c). We first initialize the phi 
congruence classes of resources referenced in the two phi instructions by putting each 
resource in its own class (e.g., phiCongruenceClass[xl]={xl}, 
phiCongruenceClass[x2]={x2}, etc.) Next, using liveness analysis we can derive 
LiveOut[L2]={x2,y2} and LiveIn[L2]={x2,y2}. Now consider the first phi 
instruction, where we can see that x2 and y2 interfere with each other. Note that the 
use of x2 in the second phi instruction occurs at the end of L2 (see Section 3 for 
details). Also, notice that y2 is in LiveIn[L2] and that x2 is in LiveOut[L2]. Therefore 
we will insert two copies, one for x2 and one for y2. The resulting program is shown 
in Figure 9(a). After inserting the copies we incrementally update the Livein set, the 
LiveOut set and the interference graph to reflect the new changes. The new 
LiveIn[L2]={x2’,y2} and LiveOut[L2]={x2,y2’}. We will also update and merge the 
phi congruence classes for resources in the phi instruction so that resources xl, x2’, 
and y2’ are put in the same phi congruence class. Now consider the second phi 
instruction and notice that x2 and y2 still interfere. We can see that x2 is not in 
LiveIn[L2] and y2 is not in LiveOut[L2]. So only one copy is needed to eliminate the 
phi interference. The resulting program is shown in Figure 9(b). We can see that we 
have inserted only three copies and all three copies are essential. 
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LiveIn[L2] = (x2',y2) 

LiveOutlL2] = {x2.y2’) 

(a) (b) 

Figure 9. Breaking interferences in the sw ap problem. 



5. SSA Based Coalescing 

Once phi instructions have been eliminated the algorithms given in [7] and in [2] rely 
on Chaitin’s coalescing algorithm to eliminate as many redundant copies as possible. 
Consider the CSSA form shown in Figure 10(a). Since none of the resources within 
phi instructions interferes with each other, we can eliminate the phi instruction by 
using the phi congruence property. The resulting program is shown in Figure 10(b). 
Now let us apply Chaitin’s algorithm to eliminate the copy x=y [3]. Since x and y 
interfere with each other this copy cannot be eliminated. 




Figure 10. Cop y elimination using Chaitin’ s coalescing algorithm. 



In this section we present a new CSSA-based coalescing algorithm that will allow 
us to eliminate the copy xl=yl. The key intuition behind our algorithm is that we can 
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eliminate a copy x=y even when their live ranges interfere, so long as the coalesced 
live range does not introduce any phi resource interference. 

Let x=y be the copy that we wish to eliminate. Assume that they are not in the 
same phi congruence class. (It is trivial otherwise.) There are three cases to consider: 

• Case 1: phiCongruenceClass[x]=={} and phiCongruenceClass[y]=={}. This 
means that x and y are not referenced in any phi instruction. The copy can be 
removed even if x and y interfere. 

• Case 2: phiCongruenceClass[x]=={} and phiCongruenceClass[y]!={}. If x 
interferes with any resource in (phiCongruenceClass[y]-y) then the copy cannot 
be removed, otherwise it can be removed. The situation is similar but opposite for 
the case where phiCongruenceClass[x]!={}and phiCongruenceClass[y]= {}. 

• Case 3: phiCongruenceClass[x]!={} and phiCongruenceClass[y]!={}. The copy 
cannot be removed if any resource in phiCongruenceClass[x] interferes with any 
resource in (phiCongruenceClass[y]-y) or if any resource in 
phiCongruenceClass[y] interferes with any resource in (phiCongruenceClass[x]- 
x), otherwise it can be removed. 

Consider the example shown in Figure 10(a). From the figure we can see that phiCon- 
gruenceClass[xl] = {xl, x2, x3} and phiCongruenceClass[yl]={yl,y2,y3}. We can 
see that xl and yl interfere, but we can still eliminate the copy using Case 3. After 
eliminating the copy, xl = yl, the two phi congruence classes are merged. Since none 
of resources in the phi congruence class interferes with each other we can eliminate 
the phi instruction by replacing all references to resources in the merged phi 
congruence class with a representative resource. The resulting program no longer has 
the copy, x = y. Our heuristic for handling Case 3 is conservative but safe. One can 
easily enhance this heuristic so that more copies can be eliminated while still ensuring 
that the phi congruence property is satisfied. 



6. Experimental Results 

To demonstrate the effectiveness of our approach we implemented all three methods 
for translating out of SSA form. We also implemented our CSSA based coalescing. 
The experimental results are summarized in Table 1. We ran our experiments on a 
number of procedures taken from SPECint95 and other application suites. Here we 
present our results for a set of ten representative procedures. Nine out of the ten 
procedures are from the SPECint95 suite, and one is from operating system source 
code. One typical characteristic of the ten procedures is that they are large. In our 
compiler, optimizations, such as global code motion and common sub-expression 
elimination, introduce phi resource interferences. All of the dead phi instructions have 
been pruned as part of our SSA construction phase. 

Note that reducing the compilation time and space usage is a key motivation to 
develop the third method to translate out of SSA form. For all three methods we 
present two kinds of data; the first kind represents space usage and the second kind 
represents running time. In Tablel BT indicates the number of copies prior to leaving 
the TSSA form, and AT indicates the number of copies after translating to the CSSA 
form. The difference between AT and BT gives the number of copies introduced 
during the translation of the TSSA form to the CSSA form. For the set of benchmarks 
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that we experimented with we found that during the translation Method II introduces 
72.1% fewer copies than Method I, and Method III introduces 89.9% fewer copies 
than Method I. The number of copies inserted during the translation process is an 
indication of space usage. Thus, we can see that both Method II and Method III 
outperform Method I in terms of space efficiency. 

AC in the table indicates the number of copies that exist after applying our CSSA 
based coalescing to the CSSA form. For both Method II and Method III we did not 
recompute the data flow information and the interference graph, but relied on the 
incremental update performed by the two methods. Since, for Method II, the 
interference graph update is conservative, after coalescing there are, on average, 
29.1% more copies than for Method I. Interestingly, after coalescing there are 8.6% 
fewer copies using Method III compared to Method I. It is important to note that for 
Method III, the subsequent coalescing does not eliminate any copy instruction 
introduced during the TSSA-to-CSSA translation, but instead it eliminates only 



Table 1: Empirical result for the three methods. 



Procedure 

Name 


BT 


Meth ods 


AT 


AT-BT 


AC 


DF/IG 

(secs) 


TT 

(secs) 


TC 

(secs) 


Total 

(secs) 


Part delete 


86 


I 


112 


26 


92 


0.08 


0.01 


0.02 


0.11 


(vortex) 




11 


100 


14 


99 




0.01 


0.01 


0.11 






III 


92 


6 


91 




0.01 


0.02 


0.12 


Yyparse 


723 


1 


918 


195 




2.63 


0.04 


0.50 


3.17 


(gcc) 




11 


761 


38 




2.50 


0.05 


0.43 


2.98 






III 


738 


15 




2.64 


0.03 


0.42 


3.09 


Yylex 


1060 


1 


2901 


1841 




3.51 






5.84 


(perl) 




11 


1309 


249 




2.77 






4.15 






III 


1134 


74 




2.93 






3.93 


Yylex 


1573 


1 


4632 




660 


6.37 




3.78 


10.75 


(gcc) 




11 


1825 




670 


5.30 




1.61 


7.81 






III 


1648 




493 


5.86 




1.62 


7.86 


Reload 


344 


1 


1378 




410 


3.56 


0.15 


0.87 


4.58 


(gcc) 




11 


802 




675 


2.73 


0.74 


0.35 


3.82 






III 


525 




385 


3.15 


0.25 


0.27 


3.67 


Iscaptured 


61 


1 


194 




55 




0.01 


0.04 


0.17 


(go) 




11 


138 




104 




0.03 


0.02 


0.16 






III 


89 




57 




0.02 


0.02 


0.16 


Cse insn 


396 


1 


1476 


1080 






0.18 


0.69 


3.92 


(gcc) 




11 


698 


302 




2.61 


0.53 


0.32 


3.46 






III 


492 


96 




2.85 


0.18 


0.27 


3.30 


Eval 


1375 


1 


3224 


1849 


619 


7.14 


0.40 


2.50 


10.04 


(perl) 




11 


1546 


171 


717 


4.45 


0.82 


1.05 


6.32 






III 


1456 


81 


624 


5.80 


0.20 


1.13 


7.13 


Ttin 


539 


1 


2389 


1850 


826 


3.44 


0.20 


1.41 


5.05 


(o/s code) 




11 


1369 




1201 


2.43 


2.21 


0.30 


4.94 






III 


761 


222 


600 


2.59 


0.87 


0.18 


3.64 


load data 


163 


I 


183 




53 


0.11 


0.01 


0.04 


0.16 


(mSSksim) 




11 


163 




53 


0.10 


0.01 


0.03 


0.14 






III 


163 




53 


0.11 


0.01 


0.04 


0.1 

6 


%improve 




I 


* 


* 


* 


* 


* 


* 


* 


merit* 




11 


* 


72.1 


-29.1 


* 


* 


* 


13.1 






III 


* 


89.9 


8.6 


* 


* 


* 


15.1 



redundant copies that were present prior to the translation. Our experimental results 



Percentage improvement is first calculated with respect to Method I for each 
procedure and is then averaged over all procedures. 
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corroborate this behavior (in many cases the AC value is less than the BT value for 
Method III). After copy elimination, Method II clearly has the most copies left, and 
Methods I and III have comparable numbers of copies left in most of the cases. But 
for procedures yylex(gcc), yylex(perl), and ttin (o/s code). Method III is very effective 
in terms of reducing the number of copies in final code (an improvement of more than 
24%). One reason for this dramatic improvement is because the coalescing algorithm 
has to eliminate more copies in Method I than in Method III. The copy elimination 
order is important for Method I since it impacts the total number of copies that are 
eliminated. Since Method III places only the copies that cannot be eliminated by 
coalescing, it is less affected by the elimination order. In summary, we conclude that 
Method II and Method III have better space efficiency than Method I, and Method III 
is the most effective in terms of reducing the number of copies in the final code. 

Next we will discuss the running time for all three methods. The column DFA/IG 
indicates the time for computing the data flow information and the interference graph. 
The column TT indicates the time for translating the TSSA form to the CSSA form, 
and TT also includes the time to incrementally update data flow information and 
interference graph for the relevant methods. Column TC indicates the time for our 
CSSA based coalescing. Finally, the last column indicates the summation of DFA/IG, 
TT, and TC. 

From the table we can see that, for all three methods computing the data flow 
information and the interference graph dominates their overall running time. The time 
in DFA/IG tends to be longer for Method I because of a large number of extra copies 
inserted during TSSA-to-CSSA translation. Since Method III also tracks additional 
live-in sets, it takes more time than Method II for computing the data flow 
information. Method II needs to update data flow sets and interference graph for a 
relatively large number of copy instruction inserted during its TSSA-to-CSSA 
translation, so it usually requires the longest time under TT. Compared to the other 
two methods, the copy elimination in Method 1 has many more copy instructions to 
eliminate, and thus. Method I takes a longer time under TC. 

By examining the total running time, Method III performs significantly better than 
Method 1 in more than half of the cases and comparably in the rest. On average it is 
about 15% faster than Method I. Although Method II has the fastest running time in 
some cases, this method is not very effective in reducing the number of copies in the 
final code. 



7. Discussion and Conclusion 

In this paper we presented a new framework for translating out of SSA form and for 
eliminating redundant copies. Previous work that are most relevant to ours are due to 
Cytron et al. [7] and Briggs et al. [2]. Both methods pessimistically insert copies for 
all the source resources of a phi instruction. Our Method I and the previous two 
methods rely on a subsequent coalescing phase to eliminate redundant copies. Our 
experimental results indicate that pessimistically inserting copies increases the 
space/time requirements of the algorithm. 

Cytron et al. never insert copies for the target resource of a phi instruction. The 
work in [2] showed that this leads to incorrect code generation in certain cases when a 
transformation, such as value numbering, is performed on the SSA form. Briggs et al. 
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[2] illustrated two examples, the “lost copy” problem and the “swap” problem, where 
the original Cytron et al. algorithm would fail [7], They then proposed algorithms to 
handle both of these problems. Unlike our Method 1, they judiciously place copies for 
the target resource of a phi instruction to handle the two special cases. The solution to 
the swap problem requires ordering on copy insertion, and the solution to the lost 
copy problem requires data flow live out information. To summarize their algorithm, 
basic blocks in the control flow graph of a program are visited in a preorder walk over 
the dominator tree. Each basic block is then iterated to replace uses in phi instructions 
and other instructions with any new names previously generated during the preorder 
walk. A list of copies that are needed in this basic block are built and inserted in the 
order determined by the algorithm that handles the swap problem. Finally, as each 
copy is inserted, an algorithm to handle the lost copy problem is invoked (which uses 
live out information to ensure that a needed copy is not lost). 

We presented a uniform framework for eliminating phi instructions. Our 
framework is based on two important properties: (1) the phi congruence property, and 
(2) the property that none of the resources in a phi congruence class interfere. We use 
liveness analysis and an interference graph to eliminate phi instructions. In [2] the 
authors remark that they reduced the problem of eliminating phi instruction to a 
scheduling problem. In our framework we have reduced the problem of eliminating 
phi instructions to a coloring-based register allocation problem [3]. In this paper we 
presented one strategy for spilling copies, where copies for source resources of a phi 
instruction are inserted in the predecessor basic blocks, and a copy is inserted for the 
target resource in the same basic block as the phi instruction. One can envision other 
spilling strategies that can further reduce the number copies needed to correctly 
eliminate phi instructions. Note that, unlike the Briggs et al. algorithm, our framework 
does not use any structural properties of the control flow graph or the dependence 
graph induced by the SSA names (the SSA graph) to ensure that the copies are placed 
correctly. We also do not visit the basic blocks in any particular order to ensure that 
copies are placed correctly. Another unique aspect of our framework is the notion of 
phi congruence property of the CSSA form. We exploited this property to translate 
TSSA form to CSSA form by breaking interferences among resources in a phi 
congruence class, and then eliminating phi instructions in the CSSA form. Our new 
CSSA-based algorithm also uses phi congruence classes to correctly and aggressively 
eliminate copies. 

Finally, it is important to note that although our framework handles all control flow 
structures including loops with multiple exits and irreducible control flow and does 
not require any explicit control flow structure for correctness, transformations, such as 
edge splitting, help reduce the number of copies that are inserted in our phi instruction 
elimination phase. It also helps remove more copies during our CSSA-based 
coalescing. We have also observed that visiting basic blocks in a certain order helps 
improve the effectiveness of both the phi instruction elimination phase and the copy 
elimination phase. These issues are part of future work and beyond the scope of this 
paper. 
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Appendix: Complete Algorithm for Method III 

Algorithm A: Algorithm for eliminating phi resource interferences based on data flow and 

interference graph updates. 

eliminatePhiResourceInterferenceO 

Inputs: instruction stream, control flow graph (CFG), Livein and LiveOut sets, interference 
graph 

Outputs: instruction stream, Livein and LiveOut sets, interference graph, phi congruence 
classes 

{ 

1 : for each resource, x, participated in a phi 
phiCongruenceClass[x] = {x}; 

2: for each phi instruction (philnst) in CFG { 

philnst in the form of xO = f(xl:Ll, x2:L2, ..., xn:Ln); 

LO is the basic block containing philnst; 

3: Set candidateResourceSet; 
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for each xi, 0 <= i <= n, in philnst 
unresolvedNeighborMap[xi] = {}; 

4: for each pair of resources xi:Li and xj:Lj in philnst, where 0 <= i, j <= n and xi != xj, 

such that there exists yi in phiCongruenceClass[xi], yj in phiCongruenceClass[xj], 
and yi and yj interfere with each other, { 

Determine what copies needed to break the interference between xi and xj using the 
four cases described in Section 4.3. 

} 

5: Process the unresolved resources (Case 4) as described in Section 4.3. 

6: for each xi in candidateResourceSet 

insertCopy(xi, philnst); 

7: // Merge phiCongruenceClass’s for all resources in philnst. 

currentphiCongruenceClass = {}; 
for each resource xi in philnst, where 0 <= i <= n { 

currentphiCongruenceClass += phiCongruenceClass[xi]; 

Let phiCongruenceClass[xi] simply point to currentphiCongruenceClass; 



8: Nullify phi congruence classes that contain only singleton resources. 

} 

insertCopy(xi, philnst) 

{ if( xi is a source resource of philnst ) { 

for every Lk associated with xi in the source list of philnst { 

Insert a copy inst: xnew_i = xi at the end of Lk; 

Replace xi with xnew_i in philnst; 

Add xnew_i in phiCongruenceClass[xnew_i] 

LiveOut[Lk] += xnew_i; 

if( for Lj an immediate successor of Lk, xi not in LiveIn[Lj] and not used in a phi 
instruction associated with Lk in Lj ) 

LiveOut[Lk] -= xi; 

Build interference edges between xnew_i and LiveOut[Lk]; 

} 

} else { 

// xi is the phi target, xO. 

Insert a copy inst: xO = xnew_0 at the beginning of LO; 

Replace xO with xnew_0 as the target in philnst; 

Add xnew_0 in phiCongruenceClass[xnew_0] 

LiveIn[L0] -= xO; 

LiveIn[L0] += xnew_0; 

Build interference edges between xnew_0 and LiveIn[L0]; 
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Abstract. This paper describes a general and powerful method for dead 
code analysis and elimination in the presence of recursive data construc- 
tions. We represent partially dead recursive data using liveness patterns 
based on general regular tree grammars extended with the notion of live 
and dead, and we formulate the analysis as computing liveness patterns 
at all program points based on program semantics. This analysis yields a 
most precise liveness pattern for the data at each program point, which is 
significantly more precise than results from previous methods. The anal- 
ysis algorithm takes cubic time in terms of the size of the program in the 
worst case but is very efficient in practice, as shown by our prototype 
implementation. The analysis results are used to identify and eliminate 
dead code. The general framework for representing and analyzing prop- 
erties of recursive data structures using general regular tree grammars 
applies to other analyses as well. 

1 Introduction 

Dead computations produce values that never get used [I]. While programmers 
are not likely to write code that performs dead computations, such code ap- 
pears often as the result of program optimization, modification, and reuse [40,1]. 
There are also other programming activities that do not explicitly involve live or 
dead code but rely on similar notions. Examples are program slicing [60,45], spe- 
cialization [45], incrementalization [34,33], and compile-time garbage collection 
[24,21,42,57]. Analysis for identifying dead code, or code having similar proper- 
ties, has been studied and used widely [8,7,25,41,1,24,21,10,26,34,54,45,33,57]. It 
is essentially backward dependence analysis that aims to compute the minimum 
sufficient information needed for producing certain results. We call this dead code 
analysis, bearing in mind that it may be used for many other purposes. 

In recent years, dead code analysis has been made more precise so as to 
be effective in more complicated settings [21,10,26,45,5,33]. Since recursive data 
constructions are used increasingly widely in high-level languages [52,14,37,3], an 
important problem is to identify partially dead recursive data — that is, recursive 
data whose dead parts form recursive substructures — and eliminate computa- 
tions of them.^ It is difficult because recursive data structures can be defined by 

* The authors gratefully acknowledge the support of NSF under grant CCR-9711253 
and ONR under grants N00014-99-1-0132 and N00014-99-1-0358. 

^ This is different from partial dead code, which is code that is dead on some but not 
all computation paths [26,5]. 
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the user, and dead substructures may interleave with live substructures. Several 
methods have been studied [24,21,45,33], but all have limitations. 

This paper describes a general and powerful method for analyzing and elim- 
inating dead computations in the presence of recursive data constructions. We 
represent partially dead recursive data using liveness patterns based on general 
regular tree grammars extended with the notion of live and dead, and we formu- 
late the analysis as computing liveness patterns at all program points based on 
program semantics. This analysis yields a most precise liveness pattern for the 
data at each program point. The analysis algorithm takes cubic time in terms of 
the size of the program in the worst case but is very efficient in practice, as shown 
in our prototype implementation. The analysis results are used to identify and 
eliminate dead code. The framework for representing and analyzing properties 
of recursive data structures using general regular tree grammars applies to other 
analyses as well. 

The rest of the paper is organized as follows. Section 2 describes a pro- 
gramming language with recursive data constructions. Section 3 defines liveness 
patterns that represent partially dead recursive data. Section 4 formulates the 
analysis as solving sets of constraints on grammars. Section 5 presents efficient 
algorithms for computing liveness patterns at every program point. Section 6 
describes dead code elimination, our implementation, and extensions. Section 7 
compares this work with related work and concludes. 



2 Language 



We use a simple first-order functional programming language. The expressions 
of the language are: 



e 



V 

c(ei, ..., e„) 
p(ei, e„) 

if ei then 62 else 63 
let V = 6i in 62 
/(ei, .... e„) 



variable 

constructor application 
primitive function application 
conditional expression 
binding expression 
function application 



Each constructor c, primitive function p, and user-defined function / has a fixed 
arity. If a constructor c has arity 0, then we write c instead of c(). New con- 
structors can be declared, together with their arities. When needed, we use c" to 
denote that c has arity n. For each constructor c”, there is a primitive function 
c? that tests whether the argument is an application of c, and if n > 0, then 
for each i = l..n, there is a primitive function that selects the ith component 
in an application of c, e.g., C 2 (c^{x,y, z)) = y. A program is a set of mutually 
recursive function definitions of the form: 



f{vi,...,Vn) = e 



( 1 ) 



together with a set of constructor declarations. Figure 1 gives some example 
definitions, assuming that min and max are primitive functions, and that con- 
structors nip, cons^, and triple^ are declared in the programs where they are 
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used. For ease of reading, we use null instead of nil?, car instead of consi, cdr 
instead of cons^, and 1st, 2nd, and 3rd instead of triple^, triple^, and triple^, 
respectively. 



minmax{x) : compute min and max for all suffixes of x 
minmax{x) = if null{x) then nil 

else if null{cdr{x)) then 

cons{triple{car{x), car{x), car(tc)), nil) 
else let v — minmax{cdr{x)) in 
cons{triple(car{x), 

min(car(ai), 2nd{car{v))) , 
max(car(ai) , 3rd{car(v)))) , 
v) 

listsecond{x) : list the second element in each triple in x 
listsecond{x) = if null{x) then nil 

else cons(2nd(car(a;)), listsecond{cdr{x))) 



getmin{x) : compute the min elements for all suffixes of x 
getmin{x) = listsecond(minmax(x)) 



len(x) : compute length of x 

len{x) = if null{x) then 0 

else 1 + len{cdr{x)) 

odd{x) : get elements of x at 
odd positions 

even{x) : get elements of x at 
even positions 

odd{x) = if null{x) then nil 
else cons{car{x) , 

even{cdr{x))) 

even{x) = if null{x) then nil 
else odd{cdr{x)) 



Fig. 1. Example function definitions. 



This language has call-by-value semantics. Well-defined expressions evaluate 
to constructed data, such as cons{3, nil). We use _L to denote the value of unde- 
fined (non-terminating) expressions; an expression must evaluate to _L if any of 
its subexpressions evaluates to _L. Since a program can use data constructions 
c(ei, ...,e„) in recursive function definitions, it can build data structures of un- 
bounded sizes, i.e., sizes not bounded in any way by the size of the program but 
determined by particular inputs to the program. 

There can be values, which can be subparts of constructed data, computed 
by a program that are not needed in obtaining the output of the program. To 
improve program efficiency, we can eliminate such dead computations and use a 
special symbol _ as a placeholder for their values. A constructor application does 
not evaluate to _ even if some arguments evaluate to _ . A primitive function 
application (or a conditional expression) must evaluate to _, if not T, if any 
of its subexpressions (or the condition, respectively) evaluates to _. Whether 
a function application (or a binding expression) evaluates to _ depends on the 
values of the arguments (or the bound variable, respectively) and how they are 
used in the function definition (or the body, respectively). 

Dead code may exist in a program especially when only certain parts of its 
result or intermediate results are needed. Such parts can be specified by a user 
or determined by how these results are used in computing other values, e.g., by 
how the value of minmax is used in computing getmin, in which case all the 
max operations are dead. 
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3 Liveness patterns 

We represent partially dead recursive data using liveness patterns. A liveness 
pattern indicates which parts of data must be dead and which parts may be live. 
D indicates that a part must be completely dead, and L indicates that a part 
may be completely live. Partial liveness is represented using constructors. For 
example, cons{D, L) indicates a cons structure with a definitely dead head and 
a possibly live tail. Also, nil{) indicates the liveness pattern corresponding to 
a nil structure, so there is no confusion between a liveness pattern and a data 
value. A liveness pattern is a function; when applied to a data value, it returns 
the data with the live parts unchanged and the dead parts replaced by _ . For 
example. 



cons{D, cons{L, D)) {cons{0, cons{l, cons{2, nil)))) — cons(_ , cons{l, _ )) . 

Formally, liveness patterns are domain projections [48,17], which provide a 
clean tool for describing substructures of constructed data by projecting out the 
parts that are of interest [56,27,39,45,33]. Let X be the domain of all possible 
values computed by our programs, including _L and values containing _ . We 
define an ordering C on X, where we read X\ C X 2 as “xi is more dead than 
X 2 ”: for all X m. X, X, and for two values x\ and X2 other than _L, 



x\Qx 2 iff xi = _, or x\ = X 2 , or 

Xi = c(xii, ...,Xi„), CC 2 = c(cc 2 i, ...,X 2 „), and Xu Q X 2 i foT i = l..n. 

( 2 ) 

A liveness pattern over X is a function tt : X ^ X such that 7t(x) C x and 
7r(7r(a;)) = 7t(x) for all x € X. L is the identity function: L{x) = x. D is the 
absence function: D{x) = _ for all a; yf _L, and D{±) = _L. c”(7Ti, ..., 7r„) is the 
function: 



c”(7ri,...,7r„)(a:) 



c”(7Ti(xi), ...,7T„(a;„)) if a; = c”(a;i, ...,x„) 
_L otherwise 



( 3 ) 



Grammar-based liveness patterns. We represent liveness patterns as grammars. 
For example, the grammar S nilQ \ cons{D, S) projects out a list whose 
elements are dead but whose spine is live. It generates the set of sentences 
{nil{),cons{D,nil{)), cons{D,cons{D,nil{))), ...}. Applying each element to a 
given value, say, cons(2, cons{4, nil)), yields _L, _L, cons{- , cons{- , ml)), _L, ..., in 
which cons{- , cons{- ,nil)) is the least upper bound. 

Formally, the grammars we use for describing liveness patterns are regular 
tree grammars [16], which allow bounded, and often precise, representations 
of unbounded data [23,38,39,2,51,9,45]. A regular-tree-grammar-hased liveness 
pattern G is a quadruple {T,Af,V,S), where T is a set of terminal symbols 
including L, D, and all possible constructors c, Af is a set of nonterminal symbols 
N,V is & set of productions of the form: 

N^D, N^L, or iV^ c"(A^i, ..., iV„), 



( 4 ) 
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and nonterminal S is the start symbol. So, our liveness patterns use general 
regular tree grammars [23,16,9] extended with the special constants D and L. 
The language Cq generated by G is the set {tt G T* | S' t^} of sentences. The 
projection function that G represents is: 

G{x) = U{7r(a;) | tt G Cq} (5) 

where U is the least upper bound operation for C. It is easy to see that G{x) is 
well-defined for all x G X. We overload L to denote the grammar that generates 
sentence L, and overload D to denote the grammar that generates only sentence 
D. For ease of presentation, when no confusion arises, we write grammars in 
compact forms. For example, {S ^ nil{) \cons{N,S), N triple{D, L, D)}, 
where j denotes alternation, projects out a list whose elements are triples whose 
first and third components are dead. 

We extend regular tree grammars to allow productions of the form: 

N^N', N^c-(N'), or N^[N']R' (6) 

for R' of the form L, c^{N\, ...,Nn), or TV", and we define: 






L if TT = L 

TTj if 7T = c”(7Ti, ...,7T„) and [7r]7r' 

D otherwise 



D a TT = D 
tt' otherwise 



( 7 ) 



These extended forms are for convenience later; the selector form in the middle 
of (6) is the same as that first used by Jones and Muchnick [23], and the con- 
ditional form on the right of (6) is for similar purposes as those used in several 
other analyses, e.g., the operator [> used by Wadler and Hughes for strictness 
analysis [56]. Given an extended regular tree grammar G that contains produc- 
tions of the forms in (4) and (6), we can construct a regular tree grammar G' 
that contains only productions of the form (4) such that Gq = Gq' ^ i.e., G' and 
G represent the same projection function; an algorithm is given in Section 5. 

When using a grammar G, what matters is the projection function that G 
represents. In fact, different grammars can represent the same projection func- 
tion. A basic idea of this work is to capture the information of interest — liveness 
patterns — using grammars that are constructed based on program semantics and 
then simplify the grammars to equivalent grammars in simpler forms where, in 
particular, the only grammar that represents D is {S'— *■ D}. 

We define an ordering < on regular-tree-grammar-based liveness patterns. 
For two grammars G\ and G 2 , we define: 

Gl < G 2 iff VtTi G £Gi, 37T2 G £g2, 7Ti < 7T2, (8) 

where for two sentences tt\ and tt 2 , we define (overloading <): 

7Ti < 7T2 iff TTi = D, or 7T2 = L, Or 

7Ti = c(7Tii, ...,7Ti„), 7T2 = c( 7T21, ...,7T2„), and TTn < 7T2i for f = 1 . .71. 

(9) 
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For convenience, we define Gi > G 2 if and only if G 2 < Gi . It is easy to see that 

if Gi < G 2 , then Vx, Gi(x) C G 2 (x). (10) 

This means that if Gi < G 2 , then Gi projects out values that are more dead than 
those that G 2 projects out. This is a basis of our correctness proof. The converse 
is not true, e.g., Gi = {S ^ cons{L, L)} and G 2 = cons{L, D) \ cons{D, L)} 

form a counterexample for the converse, but this converse is not used. Note that 
this ordering on grammars does not form a complete partial order. 

Notation. We use GON to denote the grammar that projects any constructor 
but none of its arguments: letting 7). be the set of all possible constructors, 

n 

GON = {% U {D}, {,S}, {S^ c”(A^) I c” G rj, 5).2 (11) 

Given a grammar G = {T,J\f,V,S), we use conc,i{G) to denote using G as the 
tth component of a c" structure {i < n) whose other components are dead: 
assuming S' is a nonterminal not used in G, 



coUc^i{G) 



(T, Afu{S'}, VU{S' 



i—l n—i 




(12) 



and we use selc,i{G) to denote the part of G corresponding to the tth component 
of a c" structure {i < n): assuming S' is a nonterminal not in used in G, 

sG,,i(G) = (T, UU{S'}, VU{S'^c-{S)}, S'). (13) 



For example, if nil and cons are all possible constructors where a liveness pattern 
GON is used, as we assume for functions len, odd, and even, then GON = {S^ 
nil{) I cons{D, D)}. If G = L}, then concons,i{G) = {S' ^ cons{S, D), S^ 

L} and selcons,i{G) = {5"^ car{S), S^ L\. Finally, we define a conditional: 

„ N ( D if Gi = D A\ 

cond(Gi,G2) = |^^^^j^^^^.^^_ (14) 



4 Analysis of liveness patterns nsing constraints 

Dead code analysis computes liveness patterns associated with values at program 
points, such as function definitions, parameters, and subexpressions. We develop 
such a backward dependence analysis. Given liveness patterns associated with 
certain program points, it computes liveness patterns at all program points, so 
that the liveness specified by the given liveness patterns is guaranteed. The basic 
idea is that a liveness pattern associated with a program point is constrained 
by liveness patterns associated with other points based on the semantics of the 
program segments involved. 

For convenience, we fold D into the right sides of the productions. 



2 
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Sufficiency conditions. The resulting liveness patterns must satisfy two kinds 
of sufficiency conditions. First, the resulting grammar at a program point must 
project out values that are more live than required by the given grammar (if any) 
associated with that point. Precisely, at each subexpression e where a liveness 
pattern is given, if the given grammar is Gq, and the resulting grammar is G, 
then Go(e) G G(e) for all values of the free variables in e. Second, the resulting 
grammars must satisfy the constraints determined by the program semantics. 
Precisely, assume a resulting grammar is associated with each parameter and 
each subexpression of all function definitions. Let '^'e denote that grammar G is 
associated with e. Then (1) the liveness patterns at function parameters must 
be sufficient to guarantee the liveness pattern at the function return, i.e., for 
each definition of the form the following sufficiency 

condition must be satisfied for all values Ui, ...,w„: 

G(/(ui,...,^;„)) C/(Gi(ui),...,G„(u„)) (15) 

and (2) the liveness pattern at each subexpression must be sufficient to guarantee 
the liveness pattern at the enclosing expression, i.e., for each subexpression that 
is of a form in the left column below, the corresponding sufficiency condition 
in the right column must be satisfied for all values of the free variables in the 
subexpression: 

G(c(ei,...,e„)) C c(Gi(ei), .... G„(e„)) 

G(c.-(e')) CcdG'(e')) 

G(c?(e')) C c?(G'(e')) 

'ei , ... 6n) if q is p other than cj or c? 

G(g(ei,...,e„)) Cqr(Gi(ei)....,G„(e„)) 

^'if ^^'ei then else ^^'€3 G(if ei then 62 else 63) Cl If Gi(ei) then 02(^2) else Gste^) 

^let u — in ^^'e2 G(let u — ei in 62) G let u = Gi(ei) in G2(e2) 

G(/(ei, .... e„)) C /(Gi(ei), .... G„(e„)) 

Note that no approximation is made in these conditions. For example, the con- 
dition of a conditional expression does not have to be evaluated, so it does not 
have to be associated with L. In particular, the liveness patterns associated with 
a function application are not related to liveness patterns associated with the 
definition of the function, and thus, different applications of the same function 
may require different parts of the function definition to be live. For example, 
consider functions / and g below: 

f{x,y) ^ if a: > 0 then a: else y g(z) ^ 2) + 2) 

Given Gq = L at the definition of g, the liveness patterns associated with all 
program points, where the liveness patterns not explicitly written are all L, 
satisfy the sufficiency conditions. Note that the two calls to / need different 
parts of / to be live. 

Grammar constraints. Given liveness patterns associated with certain subex- 
pressions, we construct a set of constraints on the resulting liveness patterns 
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that guarantee the sufficiency conditions. First, at each subexpression e where a 
liveness pattern is given, if the given grammar is Go, and the resulting grammar 
is G, then we construct G > Gq. Second, for a function definition of the form 
...p"-'Vn) = e, we construct, for i = l..n and for each occurrence 
in e, the constraint 

G, > G' (16) 

and, for each subexpression of e that is of a form in the left column of Figure 2, we 
construct the corresponding constraints in the right column. These constraints 
make approximations while guaranteeing the sufficiency conditions, as explained 
below. 



(1) Gi > selc^i{G) for i = l..n 

(2) G' > cond(G,conc,i{G)) 

( 3 ) G' > cond{G,GON) 

( 4 ) ...,^^'6n) if O' is p other than cj or c? 

Gi > cond{Gy L) for i — l..n 

( 5 ) '^If then else *^3^3 Gi > cond{G,L), G2 > G, G3 > G 

(6) ^let u — ^^'ei in Gi > cond(G^G') for each occurrence of ^ 'u in 62, G2 > G 

( 7 ) ...F"'e„) where /(^'I'ni, 

Gi > cond{G ^G'^ for i — l..n, G' > G 

Fig. 2. Grammar constraints for expressions. 



Formula (16) for function definitions requires that the liveness pattern at 
formal parameter Vi be greater than or equal to the liveness patterns at all 
uses of Vi- Rule (7) for function calls requires that, for all non-dead calls of the 
same function, the liveness patterns at the arguments be greater than or equal 
to the liveness patterns at the corresponding formal parameters, and that the 
liveness pattern for the return value of the function be greater than or equal 
to the liveness patterns at all calls. Thus, if a function call is dead, then all its 
arguments are also dead, even though the formal parameters of the function may 
not be dead due to other calls to the same function. 

Other constraints are based on the semantics of each construct locally. Rules 
(l)-(3) handle data constructions. Rule (1) says that liveness pattern at a com- 
ponent of a construction must be no less than the corresponding component in 
the liveness pattern at the result of the construction. As a special case of (1), 
for any constructor of arity 0, no constraint is added. Rule (2) requires that, 
if the result of a selection by is not dead, then the argument be as live as a 
construction using c whose tth component is as live as the result of the selection. 
Rule (3) says that, if the result of an application of a tester is not dead, then the 
liveness pattern at the argument needs to project out the outermost constructor 
but none of the components. Rule (4) says that, if the result of a primitive oper- 
ation is not dead, then each argument must be live. If we assume that primitive 
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functions are defined only on primitive types such as Boolean and integer, then 
we could use CDN in place of L in the constraints. Rule (5) requires that the 
condition be live if the result of the entire conditional expression is not dead, and 
that both branches be as live as the result. Again, we could use CDN in place 
of L as a sufficient context for ei; furthermore, if 62 equals 63, in fact as long 
as G(e2) equals G(e3), then we could use D in place of L as a sufficient context 
for 6i and thus no constraint for Gi would be needed. Rule (6) is similar to a 
function call, since it equals an application of Am . 62 to ei. It requires that the 
defining expression ei be as live as all its uses in the body, and that the body 
be as live as the result. 

We can show by a standard inductive argument that the constraints for each 
construct guarantee sufficient information, and thus an inductive proof shows 
that the sufficiency conditions are satisfied. 

5 Construction and simplification of liveness patterns 

We describe a straightforward method for building minimum grammars that 
satisfy the above constraints; these grammars may contain productions of the 
extended forms in (6). Then, we simplify the grammars by eliminating extended 
forms; this makes explicit whether the grammar associated with a program point 
equals dead. 

Constructing the grammars. Let % be the set of all possible constructors in the 
program. Let Afo be the set of nonterminals used in the given liveness patterns 
associated with selected subexpressions. We associate a unique nonterminal, not 
in Afo, with each parameter and each subexpression of all function definitions. 
Then we add productions using these terminals and nonterminals. Finally, the 
resulting grammar at a program point is formed by using these terminals, non- 
terminals, and productions, and by using the nonterminal associated with that 
point as the start symbol. 

We add two kinds of productions. For each subexpression e where a grammar 
Go is given, let Nq be the start symbol of Gq, and let N be the nonterminal 
associated with e. We add N ^ Nq as well as all productions in Gq. Second, for 
each function definition ...,^"’w„) = e, we add, for each i = l..n and 

for each occurrence ^*'Vi in e, the production 

(17) 

and, for each subexpression of e that is of a form in the left column of Figure 3, 
the corresponding productions in the right column. 

It is easy to show that the resulting grammars satisfy the grammar con- 
straints in Figure 2 and thus give sufficient information at every program point. 
To show this, simply notice that the productions in Figure 3 can be obtained 
from the constraints in Figure 2 by replacing G with N and > with and by 
replacing grammar operations with the corresponding productions based on def- 
initions: selc,i{G) with cj{N), conc^i(G) with c{D, D, N, D, D), CON with 
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(1) 'ei , 6n) Ni c“(A^) for i — l..n 

( 2 ) [N]c^{D,...,D,N,D,...,D) 

with 2—1 Z)’s before N and n — i D's after N 

( 3 ) N'^ [N]c'^(D,...,D) 

with n D's for each possible constructor 

( 4 ) ...,■^”''6^) if g is p other than c~ or c? 

Ni — > [A^] L for i — 1 . . n 

( 5 ) '^I'ei then ^'^'■€2 else " 3 'ea Ni [AT] L, N2^ N, W3 ^ AT 

(6) ^let u — in ^‘^'e2 Ni — > [A^] N' for each occurrence of ^ 'u in C2, N2 — > N 

( 7 ) ^ ' Cfi) v/here ...,^^'Vn) ' e 

Ni [Ai] Af' for i = l..n, N' ^ N 

Fig. 3. Productions added for expressions. 



c{D, ..., D) for all c, and cond(Gi, G2) with [Aii] 7 V 2 . Thus, each production con- 
structed here guarantees exactly a corresponding grammar constraint in Figure 2 
simply by definitions. Furthermore, the resulting grammars are minimal among 
all solutions that use the same set of nonterminals, and they give minimum suffi- 
cient information. To see this, notice that a smaller grammar at any point would 
make the nonterminal at that point correspond to a smaller grammar than the 
grammar generated by the right hand side(s) of the nonterminal, violating the 
corresponding grammar constraints. 

Let n denote the size of the program. Assume that the maximum arity of 
constructors, primitive functions, and user-defined functions is bounded by a 
constant. Since a constant number of productions are added at each program 
point, the above construction takes 0 (n) time. 

Example 1 . For functions len, odd, and even in Figure 1 , the nonterminals la- 
beling the program points and the added productions are shown in Figure 4 . For 
example, we have A^i2— > [fVi3]cons(A^i3, A^o)? where D \s the last produc- 
tion on the last line. It means that A^i2 is conditioned on A^i3: if A^i3 is D, so 
is fVi2, otherwise fVi2 is cons{Ni^, Nq), i.e., it projects out a cons structure, the 
first component of which is projected out by N13. Suppose we need the result of 
len; we add N2S L, since N2S corresponds to the return value of len. Suppose 
we need the result of odd; we add iVig — > L. Suppose we need to know whether 
the result of odd is nil or cons; we add Nis^ nil{), Nis^ cons{D, D). 

Simplifying the grammars. The grammars obtained above may contain produc- 
tions of the extended forms in ( 6 ) and thus be difficult to understand and use. We 
simplify the grammars by removing extended forms using an iterative algorithm 
given in Figure 5 . After the simplification, nonterminals that do not appear on 
the left side of a production with L or c{Ni , ..., iV„) on the right side are implied 
to derive only D. We can read off the grammar at any function parameter or 
subexpression by starting at the associated nonterminal and collecting all pro- 
ductions whose left sides are reachable from this start symbol. The correctness 
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- ^ 28 ^if then ^ 25^0 

else ^24^^231 +^22^/en(^2i^cdr(^20^a;)) 

odd{^^^'-x) - ^i 7 --null{^^^'-x) then 

else ■^i 4 ^cons('^i 3 'car('^i 2 'a;J^-^ii' eijen('^i 0 'c£/r(^ 9 'a:))) 

even{^^‘x) ^ ^^'null{^^‘x) then ^^'nil 

else ^^'-oddi^^'-cdri^^'-x)) 

-/V29 — » A/' 26 , A^ 29 — ^-^ 20 , -AZ '26 — ^ [A^2t] COns( A^q , A^q) , A /'26 ~ ^ [-^ 27 ]'/T'iK) ) ^ 27 ^[^ 28 ]L, A /'25 — » A /'28 , 

N23 — *[N 24 ]L, N20 — * [N21] COUS^Nq , N21) , A/^21 — * [A/^22] A /^29 5 A/^ 28 — * A/^22 1 A/^22 — ^[A/^ 24 ]A, N24 — ^ A ^28 1 

A/'i 9 — > A/'ie, A/’i 9 — ^-^12, A/'ig— >A/'9, A^16 — *■ [A/’iy] COns(A/'o, A/’o), A/'i 6 — > [A/’iyjnz/O, A/^iy — *■ [A/'isjL, 

A/'i 5 — > A/'i 8 , A/'i 2— > [A/'i3]cons(A/’i3, A/'o), A^i3— car(A/'i4) , A^g— [A/' iq] cons(A^ 0 i -^10)5 -^10—^ [-^ii]-^ 8 ; 

Nj — > 1 1 , A^i 1 — > c(It( Ni 4 ) , A/^ 1 4 — * Ni § , 

Ns^Na, Ns^Nu [Ne]cons{No,No), [NejuilQ, Ne^[N^]L, N4 .^Nt, 

Ni [Af2] cons(No, N2), N2 [A^a] ATig, Wig ^ N3, Nj, Nq^ D 

Fig. 4. Productions constructed for the example functions. 



of this algorithm is based on the definitions of the extended forms and can be 
proved in a similar way to when only the selector form is used [23]. 

Nonterminals are associated with program points, so there are 0{n) of them. 
Each step adds a production of the form N ^ L, N ^ c{Ni , ..., Nk), or > N' . 
Since each right side of the form c(A^i, ...,Nk) is among the right sides of the 
originally constructed grammar, there are at most 0{n) of them. Thus, for each 
nonterminal, at most 0{n) productions are added. So totally at most 0{v?) 
productions are added. Adding a production has 0(n) cost to check what other 
productions to add. Thus, the overall simplification takes O(n^) time. Although 
this appears expensive, the analysis is very fast in practice, as shown by our 
prototype implementation. 



input: a grammar {T ,N ,V , S) 

/* assume R is of the form L or c{N\ , ..., Ai^), and R' is of the form L, c{N\ , ..., Nn), or N” */ 

repeat 

if V contains N ^ N' and N' ^ R, then add N ^ R to V\ 

if V contains ci{N') and N' ^ L, then add N ^ L to V\ 

if V contains Ci{N') and N' ^ c"{Ni, then add Ni to "P; 

if V contains [N']R' and N' ^ R, then add N ^ R' to V', 

until no more productions can be added; 
remove all productions of the extended forms from V; 
return simplified grammar {T,M ,V, S) 

Fig. 5. Algorithm for simplifying the grammars. 




222 



Y.A. Liu and S.D. Stoller 



Example 2. Suppose we need the result of len and therefore added N 2 S — *■ L; we 
obtain the productions 

A? 29 — <■"*(()■ N29 —>■ cons{No, No), Nog ^ cons(No, N21), Nos.^ L, Noj-f-L, Noe^ nilQ, 

Nog — ^ cons(No , No) , Nog — ^ N24 — ^ L, N03 — ^ L, Noo — ^L,Noi — ^ nil)), Noi — ^ cons{No, Ng), 

Noi cons{No, Noi), Noo cons{No, Ngi) 

Suppose we need the result of odd and therefore added iVig ^ L; we obtain the 
productions 

Nig^ cons{No, No), N ig^ nil)). Nig ^ cons)Nio, Ng) , Nig ^ cons)Ng , Nig) , Nig^ L, 

Nij — »■ L, A^i 6 — nil{), Niq—^ cons{Noi Nq), Ni^—^L, ATi 4 — ^L, A^is — »-L, Ni2—^ cons{Nis, Nq), 
Nii—^L, Niq — >■ nil{), Niq—^ cons{Noi Nq)i NiQ—^cons{No,N2), Ng cons{Noi Niq), 
cons{No, Nq), Ng — »■ niZ(), Ng — > cons{No, N2), Nj^L, ^L, N^^nilO, 
cons{No, No), Ng—^L, A^2 — ^ cons(iVo, A^io)^ N2—^ cons{N\g, Nq), N2^nil{), 

N2 — > cons{No, Nq), Ni^ cons{Noi N2) 

Suppose we added nil{), Nis^ cons{D, D); we obtain the productions 

Niq^ cons{No, No), Nig^ nil{), Nig^nilQ, Nig^ cons{No, Nq), Nu^L, Niq — > nil{), 
-^16 -^0)’ Niq^ cons{No, Nq), NiQ^nilQ, N14 — > cons{No, Nq), A^i 4— ^nz/() 

In each case, other nonterminals derive only D. 

The resulting grammars can be further simplified by minimization [16], but 
minimization is not needed for identifying dead code, since minimization does 
not affect whether a nonterminal derives only D. 



6 Dead code elimination, implementation, and extensions 

Consider all function parameters and subexpressions whose associated liveness 
patterns are D. These parts of the program are dead, so we eliminate them by 
replacing them with _ . 

Example 3. Suppose we need to know whether the result of odd is nil or cons 
and therefore added fVig ^ nil{), cons{D, D); eliminating dead code 

based on the simplified grammar in Example 2, where Ni to iVig all have only 
D on the right hand sides, yields: 

odd{x) = if null(x) then nil Tl S') 

else cons(_,_) ^ ^ 

Suppose we need the result of function getmin, given in Figure 1; analyzing 
and eliminating dead code yields the following function along with functions 
listsecond and getmin: 

minmax{x) = if null{x) then nil 

else if null{cdr{x)) then cons{triple{^ , car{x), ^) , nil) 

else let v — minmax{cdr{x)) in cons{triple{_ , min{car{x),2nd{car{v))) , _), v) 

( 19 ) 

As another example, if the result of minmax is used as argument to len instead 
of listsecond, then our algorithm finds that the entire triple constructions in 
minmax are dead. However, if the result of minmax is used as argument to 
odd, then none of the subexpressions in minmax is dead, since the triple is used 
in every odd recursive call. 




Eliminating Dead Code on Recursive Data 223 



Our dead code elimination preserves semantics in the sense that, if the origi- 
nal program terminates with a value, then the new program terminates with the 
same value. 

Two further optimizations are possible but need further study. First, minmax 
in (19) can be further optimized by removing the triple constructions and selec- 
tors. Second, when the result of minmax is used as argument to odd, there is 
no dead code in minmax, but the triple in every even call is indeed dead. One 
needs to unfold the definition of minmax to remove such dead computations. 

Eliminating dead code may improve efficiency in many ways. First, the re- 
sulting programs can run faster and use less space. Additionally, compilation 
of the optimized programs takes less time and also less space, which is espe- 
cially desirable when using libraries. Furthermore, smaller programs are easier 
to understand and maintain, yielding higher software productivity. 

Implementation. We have implemented the analysis in a prototype system. The 
implementation uses the Synthesizer Generator [44]. The algorithm for simpli- 
fying the grammars is written in the Synthesizer Generator Scripting Language, 
STk, a dialect of Scheme, and consists of about 300 lines of code. Other parts 
of the system support editing of programs, display of nonterminals at program 
points, construction of grammars, highlighting of dead code, etc., and consist of 
about 3000 lines of SSL, the Synthesizer Generator Specification Language. All 
the grammars for the examples in this paper are generated automatically using 
the system. 

We have used the system to analyze dozens of examples. The lengths of those 
programs range from dozens of lines to over a thousand lines. The analysis, al- 
though written in STk, is very efficient. Our original motivation for studying this 
general problem was for identifying appropriate intermediate results to cache and 
use for incremental computation [33] . There, we propose a method, called cache- 
and-prune, that first transforms a program to cache all intermediate results, 
then reuses them in a computation on incremented input, and finally prunes out 
cached values that are not used. Reusing cached values often produces asymp- 
totic speedup, but leaving in unused values can be extremely inefficient. The 
analysis method studied in this paper, when adopted for pruning, is extremely 
effective. The pruned programs consistently run faster, use less space, and are 
smaller in code size. We also used the analysis for eliminating dead code in de- 
riving incremental programs [34]. There, the speedup is often asymptotic. For 
example, dead code elimination enables incremental selection sort to improve 
from O(n^) time to 0(n) time. 

Figure 6 summarizes the experimental results for a number of examples. Pro- 
gram minmax is as in Figure 1. Programs incsort and incout [34] are derived 
incremental programs for selection sort and outer product, respectively, where 
dead code after incrementalization is to be eliminated. Programs cachebin and 
cachelcs [31] are dynamic-programming programs transformed from straight- 
forward exponential-time programs for computing binomial coefficients and lon- 
gest common subsequences, respectively, with intermediate results cached, reused, 
and to be pruned. Program calend is a collection of calendrical calculation func- 
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tions [12], and program takr is a 100-function version of TAK that tries to defeat 
cache memory effects [47]. 
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Fig. 6. Experimental results. 



When using dead code analysis for incrementalization and for pruning unused 
intermediate results, there is always a particular function of interest, shown in 
Figure 6. For general programs, especially libraries, such as the calend example, 
there may not be a single function that is of interest, so we have applied the 
analysis on several different functions of interested. 

The size of a program is precisely captured by the total number of program 
points, which for most programs is about twice the number of lines of code. The 
number of dead program points depends on both the program and the function 
of interest. For example, for libraries, such as the calend program, much dead 
code is found, whereas for takr, all 100 functions other than the driver function 
run-takr, are involved in calling each other. Our highlighting allows us to easily 
see the resulting live or dead slices. For example, for several functions in the 
calend program, only the slice for date, not year or month, is needed. We can 
see the number of initial productions is roughly linear in the size of the given 
program, and the number of resulting productions is roughly linear in the number 
of live program points. 

The analysis time for simplifying the grammars, in milliseconds, is measured 
on an Ultra 10 with 299MHz CPU and 124 MB main memory. We can see that 
the analysis time is roughly linear in the number of live program points. This 
is important, especially for analyzing libraries, where being linear in the size of 
the entire program is clearly not good. We achieved this high efficiency by a 
careful but simple optimization in our simplification algorithm: after adding a 
new production, we consider only productions in extended forms whose right- 
hand sides use the left-hand side symbol of the new production. This makes the 
analysis proceed in an incremental fashion, and only program points that are 
not dead are followed. 

To summarize, our method produces precise analysis results as desired. The 
analysis is also very fast compared with other reported analyses using con- 
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straints. For example, Heintze’s analysis takes on the order of seconds for pro- 
grams of 100 lines to over 1000 lines [19]. 

Figure 7 is a screen dump of the system on a small example of four functions 
and constructors nil and cons. Program points are annotated with nonterminals 
highlighted in red. The shaded region contains the function of interest. The two 
sets of productions are the original set and resulting set. Dead code (function 
bigfun as well as the first argument of cons in function /) is highlighted in 
green. 




Fig. 7. A prototype implementation. 



Extensions. We believe that our method for dead code analysis can be extended 
to handle side effects. The extension is to use graph grammars instead of tree 
grammars. The ideas of including L and D as terminals, constructing grammars 
based on program points as well as the semantics of program constructs con- 
necting these points, and doing grammar simplifications are the same. Recent 
work by Sagiv, Reps, and Wilhelm [46] uses graph grammars for shape analysis. 
We believe we can make similar use of graph grammars for dead code analysis 
in the presence of destructive updates. 

Our method can also be extended to handle higher-order functions in two 
ways, and we have worked out this extension in the second way. First, we can 
simply apply a control-flow analysis [50] before we do dead code analysis. This 
allows our method to handle complete programs that contain higher-order func- 
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tions. Second, we can directly construct productions corresponding to function 
abstraction and application and add rules for simplifying them. This is similar to 
how Henglein [20] addresses higher-order binding-time analysis and how Heintze 
[19] handles higher-order functions for analyzing sets of values for ML programs. 
Similar use of constraints has been studied for stopping deforestation for higher- 
order programs [49]. Our extension adds two constraints/productions for each 
lambda expression and uses two additional rules for simplification; it is not yet 
implemented. Handling higher-order functions does not increase the time com- 
plexity of our algorithms. In fact, for a language with higher-order functions but 
not recursive data construction, the constraints may be simplified in worst-case 
almost linear time [20]. 

Our method is described here for an untyped language, but the analysis 
results provide an important kind of type information; the analysis may also 
be adopted to enhance soft typing; and the analysis applies to typed languages 
as well. For example, consider the third set of productions in Example 2. The 
grammar at each program point gives its liveness together with the shape of 
data. Dead code should be reported to the programmer before, or at least at the 
same time as, type errors such as 3rd(cons(l, 2)) in the dead code. Live code 
may have its type inferred by small refinements of our rules. For example, if we 
replace L by Boolean for the condition in rule (5) of Figure 2, we have Nn-)- 
Boolean in the third set of productions in Example 2, and thus everything there 
is precisely typed. Finally, for a typed language, possible values are restricted 
also by type information, so the overall analysis results can be more precise, 
e.g., type information about the value of an expression e can help restrict the 
grammar at e when e is the argument of a primitive function cl. 



7 Related work and conclusion 

Our backward dependence analysis uses liveness patterns, which are domain pro- 
jections, to specify sufficient information. Wadler and Hughes use projections for 
strictness analysis [56] . Their analysis is also backward but seeks necessary rather 
than sufficient information, and it uses a fixed finite abstract domain for all pro- 
grams. Launchbury uses projections for binding-time analysis of partially static 
data structures in partial evaluation [27]. It is a forward analysis equivalent to 
strictness analysis and uses a fixed finite abstract domain as well [28] . Mogensen 
[39], De Niel, and others [11] also use projections, based on grammars in par- 
ticular, for binding-time analysis and program bifurcation, but they use only 
a restricted class of regular tree grammars. Another kind of analysis is escape 
analysis [42,13,4], but existing methods can not express as precise information 
as we do. 

Several analyses are in the same spirit as ours. The necessity interpretation by 
Jones and Le Metayer [24] uses necessity patterns that correspond to a restricted 
class of liveness patterns. Necessity patterns specify only heads and tails of list 
values. The absence analysis by Hughes [21] uses contexts that correspond to 
a restricted class of liveness patterns. Even if it is extended for recursive data 
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types, it handles only a finite domain of list contexts where every head context 
and every tail context is the same. The analysis for pruning by Liu, Stoller, 
and Teitelbaum [33] uses projections to specify specific components of tuple 
values and thereby provide more accurate information. However, methods used 
there for handling unbounded growth of such projections are crude. Wand and 
Siveroni’s recent work [58] discusses safe elimination of dead variables but does 
not handle data constructions. Our method of replacing all dead code (including 
dead variables) by a dummy constant _ is simple, direct, and more general than 
their method; in particular, it is safe to simply remove dead function parameters. 

The idea of using regular tree grammars for program flow analysis is due to 
Jones and Muchnick [22], where it is used mainly for shape analysis and hence 
for improving storage allocation. It is later used to describe other data flow 
information such as types and binding times [38,39,2,11,59,51,45]. In particular, 
the analysis for backward slicing by Reps and Turnidge [45] explicitly adopts 
regular tree grammars to represent projections. It is closest in goal and scope 
to our analysis. However, it uses only a limited class of regular tree grammars, 
in which each nonterminal appears on the left side of one production, and each 
right side is one of five forms, corresponding to L, D, atom, pair, and atom | pair. 
It forces grammars to be deterministic in a most approximate way, and it gives 
no algorithms for computing the least fixed point from the set of equations. Our 
work uses general regular tree grammars extended with L and D. We also use 
productions of extended forms to make the framework more flexible. We give 
efficient algorithms for constructing and simplifying the grammars. Compared 
with [45], we also handle more program constructs, namely, binding expressions 
and user-defined constructors of arbitrary arity. 

Our treatment is rigorous, since we have adopted the view that regular-tree- 
grammar-based program analysis is also abstract interpretation and approxima- 
tions can be built into the grammar transformers as a set of constraints [9] . We 
extend the grammars and handle L and D specially in grammar manipulations. 
The result can also be viewed as using program-based finite grammar domains for 
yielding precise and efficient analysis methods. Another standard way to obtain 
the analysis result is to do a fixed point computation using general grammar 
transformers on potentially infinite grammar domains and use approximation 
operations to guarantee termination. Approximation operations provide a more 
general solution and make the analysis framework more modular and flexible 
[9]. In a separate paper [30], we describe three approximation operations that 
together produce significantly more precise analysis results than previous meth- 
ods. Each operation is efficient, but due to their generality and interaction, that 
work does not have an exact characterization of the total number of iterations 
needed. The finite domains described in this work make a complete analysis easy, 
and it yields a most precise liveness pattern for the data at each program point. 

Regular-tree-grammar-based program analysis can be reformulated as set- 
constraint-based analysis [18,19,9], but we do not know any work that treats 
precise and efficient dead code analysis for recursive data as we do. Melski 
and Reps [35,36] show the interconvertibility of a class of set constraints and 
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context-free-language reachability and, at the end of [35], they show how gen- 
eral CFL-reachability can be applied nicely to program slicing. That essentially 
addresses the same problem we do with a similar framework, but their descrip- 
tion is sketchy, with little discussion about correctness and with no results from 
implementation, experiments, or applications. 

The method and algorithms for dead code elimination studied here have many 
applications: program slicing and specialization [60,45], strength reduction, finite 
differencing, and incrementalization [7,41,34,32], caching intermediate results for 
program improvement [33], deforestation and fusion [55,6], as well as compile- 
time garbage collection [24,21,42,57]. The analysis results also provide a kind of 
type information. 

The overall goal of this work is to analyze dead data and eliminate compu- 
tations of them across recursions and loops, possibly interleaved with wrappers 
such as classes in object-oriented programs. This paper discusses techniques 
for recursion. The basic ideas should extend to loops. Pugh and Rosser’s work 
has started this direction; it extends slicing to symbolically capture particular 
iterations in a loop [43] . Object-oriented programming is used widely, but cross- 
class optimization heavily depends on inlining, which often causes code blow-up. 
Grammar-based analysis and transformation can be applied to methods across 
classes without inlining. A direct application would be to improve techniques for 
eliminating dead data members, as noted by Sweeney and Tip [53] 

Even though this paper focuses on dead code analysis and dead code elim- 
ination for recursive data, the framework for representing recursive substruc- 
tures using general regular tree grammars and the algorithms for computing 
them applies to other analyses and optimizations on recursive data as well, e.g., 
binding-time analysis for partial evaluation [27,39]. We have recently developed 
a binding-time analysis using the same framework. 
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Abstract. Detecting whether different variables have the same value 
at a program point is generally undecidable. Though the subclass of 
equalities, whose validity holds independently from the interpretation of 
operators (Herbrand-equivalences), is decidable, the technique which is 
most widely implemented in compilers, value numbering, is restricted 
to basic blocks. Basically, there are two groups of algorithms aiming at 
globalizations of value numbering: first, a group of algorithms based on 
the algorithm of Kildall, which uses data flow analysis to gather infor- 
mation on value equalities. These algorithms are complete in detecting 
Herbrand-equivalences, however, expensive in terms of computational 
complexity. Second, a group of algorithms influenced by the algorithm 
of Alpern, Wegman and Zadeck. They do not fully interpret the control 
flow, which allows them to be particularly efficient, however, at the price 
of being signihcantly less precise than their Kildall-like counterparts. In 
this article we discuss how to combine the best features of both groups by 
aiming at a fair balance between computational complexity and precision. 
We propose an algorithm, which extends the one of Alpern, Wegman and 
Zadeck. The new algorithm is polynomial and, in practice, expected to be 
almost as efficient as the original one. Moreover, for acyclic control flow 
it is as precise as Kildall’s one, i. e. it detects all Herbrand-equivalences. 



1 Motivation 

Detecting whether different variables have the same value at a program point 
is of major importance for program optimization, since equality information is 
a prerequisite of a broad variety of optimizations like common subexpression 
elimination, register allocation [10], movement of invariant code [13,16], branch 
elimination and branch fusion [10]. An even more comprehensive list is given in 
[!]• 

Unfortunately, the equality problem, i. e. the problem of determining whether 
two variables have the same value at a program point is generally undecidable. 
This holds even if control-flow branches are fully nondeterministically treated 
[12]. On the other hand, the equality problem is decidable for the subclass of 
equalities, whose validity holds independently from the interpretation of oper- 
ators (Herbrand-equivalences^). In practice, however, the equality problem is 

^ In [13] Herbrand-equivalence is called transparent equivalence. 
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usually only tackled on program fragments like (extended) basic blocks. The 
state-of-the-art is here characterized by the technique of value numbering [4], 
which is implemented in many experimental and production compilers. 

Basically, there are two groups of algorithms aiming at globalizations of value 
numbering. First, there is a group of algorithms focusing on precision. They are 
majorly influenced by Kildall’s pioneering work [8], which is based on data flow 
analysis.^ Kildall’s method decides the equality problem of variables for the class 
of Herbrand-equivalences. Its power, however, has its price in terms of compu- 
tational complexity. This is probably one of the major obstacles opposing to its 
widespread usage in program optimization. In [16,9] a variant of this algorithm 
is used in the context of semantic code motion. 

Second, there is a group of algorithms paying more attention to efficiency. 
Typical examples are the algorithms of Alpern, Wegman and Zadeck [1] or its 
precursors of Reif and Lewis [12]. Also the algorithm of Fong, Kam and Ullman 
[6], whose results are less precise than those of the two mentioned before, falls 
into this group. Characteristic for approaches of this group is a more restricted 
treatment of the control flow of the program. In contrast to the algorithms of the 
first group, where the control flow is fully interpreted, the branching structure 
is treated to a large extent in a “syntactic” fashion. As a consequence, they 
are significantly less precise, but, on the other hand, surprisingly efficient, i. e. 
almost of linear time complexity. Like Kildall’s algorithm also the algorithm of 
Alpern, Wegman and Zadeck has been used in the context of semantic code 
motion [3,2,14]. 

In this article we are going to show how to combine the best of both worlds. 
Our approach is based on the algorithm of Alpern, Wegman and Zadeck. It 
extends their approach by a normalization process, which resolves anomalies 
caused by the syntactic treatment of the control flow in the original algorithm. 
Our algorithm is of polynomial worst-case time complexity and, in practice, 
expected to be almost as efficient as the original one while, for acyclic control 
flow, being as precise as Kildall’s one, i. e. it is complete for the class of Herbrand- 
equivalences. We conjecture that our result can be extended to arbitrary control 
flow. This would provide the first polynomial time algorithm for the detection 
of all Herbrand-equivalences.^ For the sake of presentation, but without loss of 
generality, we restrict ourselves in this article to the equality problem of variables 
in a program. However, all approaches considered can easily be extended to the 
equality problem of expressions. 

The article is organized as follows. After introducing some basic notations 
and definitions in Section 2, we briefly recall the two major alternate approaches 
for detecting global value equalities in Section 3 and 4. Central is then Section 5, 
where we present our extension to the partitioning approach of Alpern, Wegman, 



^ In [15] Steffen shows how Kildall’s approach can be embedded into the framework 
of abstract interpretation with respect to the Herbrand-semantics. 

® To our knowledge, the best known worst-case complexity estimation of advanced 
Kildall-like algorithms is exponential. The estimation given in [8] is here misleading. 
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and Zadeck. Finally, we present our conclusions and a discussion on future work 
in Section 6. 



2 Preliminaries 

We consider procedures of imperative programs, which we represent by means of 
directed flow graphs G = {N, E, s, e) with node set N, edge set E, a unique start 
node s and end node e, which are assumed to have no incoming and outgoing 
edges, respectively. The nodes of G represent the statements and the edges the 
nondeterministic control flow of the underlying procedure. We assume that all 
statements are either the empty statement “skip” or 3-address assignments of 
the form x \= y or x := yi u) y 2 where x, y, yi,y 2 are variables and to a binary 
operator. By P[m, n] we denote the set of all finite paths from m to n. Without 
loss of generality we assume that every node n G N lies on a path from s to e. 
A node m is dominated by a node n, if every path leading from s to m contains 
an occurrence of n. A node n dominating m with n yf m is a strict dominator of 
m, and a strict dominator of m that is dominated by all other other dominators 
of m is an immediate dominator. 

The semantics of terms, which as usual are inductively composed of vari- 
ables, constants, and operators, is considered with respect to the Herbrand 
interpretation H = (T,"Ho)- Here, T denotes the data domain given by the 
set of terms, and "Ho the interpretation function, which maps every constant 
c to c and every operator oj to the total function Hoioj) : T x T — >• T de- 
fined by 'Ho{u)){ti,t 2 ) tiU!t 2 . Denoting the set of all Herbrand states by 
E = {a\a : V — >■ T} and the distinct start state, which is the identity on 
V, by (Jo, the semantics of terms t G T is given by the Herbrand semantics 
"H : T — >■ (A — >■ T). It is inductively defined by: 

{ a{v) \it = v is a variable 

"Ho(c) if t = c is a constant 

H.Q{uj){H{ti){a),H{t2){(j)) if t = ticot2 

Every node of a flow graph is associated with a state transformation and a 
backward-substitution function. If n = x := t the corresponding state transfor- 
mation is defined by 6 *„(ct) =* a['H{t){a) /x], and the backward-substitution of 
n for a term t' is defined by 5n{t') =* t'\t/x\. If n equals skip, both functions 
are the identity on their domain. Both and (5„ can naturally be extended to 
finite paths. This allows the following definition. Two expressions t\ and t 2 are 
called Herbrand- equivalent at the exit of node n iff'^ 

VpG P[s,n]. 'H{ti){0p{aQ))='H{t2){9p{ao)) 



^ Entry equivalence can be defined analogously. 
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3 Kildall-Like Algorithms at a Glance 



Kildall’s algorithm [8] uses data flow analysis for tackling the equality problem of 
variables in a program. In his original proposal equalities are represented as finite 
structured partitions. Rather than going into the technical details we illustrate 
the principal ideas by means of two meaningful examples, which are shown in 
Figure 1 and in Figure 2. 

In the program of Figure I the variables x and y have the same value at the 
exit of node 4. 





{(x,yja+l]} 



Fig. 1. Illustrating Kildall’s algorithm: (a) the original program and (b) the program 
annotated with structured partitions. 



Kildall’s algorithm computes for every node entry and node exit in the pro- 
gram a structured partition, which characterizes all Herbrand-equivalences in- 
volving a variable. Formally, a (finite) structured partition tt is a partition,® 
which 

1. comprises the set of variables and expressions occurring in the program and 

2. and satisfies the following consistency constraint: 

(e. Cl w 62) G 7T A (ci, e'^) G tt A (62, 62) G tt (e, e\ w 62) G tt 

An assignment at a node n = x := t is associated with a local flow function fn 
defined by 



fn{Tr) = {{tlfl2) I {Snih) , 6n{t2)) G 7 t}. 

The meet of two structured partitions tti and 7T2 is given by their intersection. 

Actually, it is sufficient to represent only those classes containing a variable 
and at least one additional element. This “sparse” representation can inductively 
be extended to cover an arbitrary large universe of expressions. An algorithm 
constructing such a minimal representation has been proposed in [16]. 

® Partitions can alternatively be considered equivalence relations on expressions. This 
view is exploited in the following definitions. 
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In order to cope with loops the annotation is computed as the greatest fixed 
point of an iteration sequence. It starts with the optimistic assumption that all 
expressions are equal (universal data flow information T) except for the start 
node where the choice of the empty partition reflects that no value equalities are 
known at the entry of the program (see Figure 2 for illustration). 



a) 






b) 




0 




1 


X : = 0 

y := x+1 




1 


X : = 0 
y := x+1 




2\y: = y+l 



{[x, 0], [y,x+l,0+l]} 




{[y,x+l] 
2[77^y+i 
{[x,y]} 



i {[y, x+i]} 
3 1 X .' = x+1 I 

{[x,y]} 




Fig. 2. Treatment of loops by Kildall’s algorithm: (a) program with optimistic partition 
annotation and (b) the greatest fixed point annotation reveals the equality of x and y 
at the exit of node 3. 



3.1 Results 

Kildall’s algorithm is precise for the class of Herbrand-equi valences. We have the 
following soundness and completeness result [15]: 

Theorem 1 (Soundness and Completeness). Two program variables x and 
y are Herbrand- equivalent at a program point n if and only if (x, y) is contained 
in the partition annotating n after termination of Kildall’s algorithm. 

Unfortunately, the precision of Kildall’s algorithm has its price in terms of 
computational complexity. In its original formulation the growth of the size of 
partition classes is exponential in the number of classes in the partition. The 
following structured partition makes this behaviour evident: 

TTexp "== {[a, b\, [c,d\, [e, f],[x, a + c,a + d,b+ c,b+ d], 

[y,x + e,x + f,{a + c) + e, . . . ,{b + d) + f]} 

Besides the size of the partition classes also their number is problematic. Ob- 
viously, the meet of two partitions tti and tt 2 is in the worst case of order 
l^(ki| |7r2|), where \TTi\ {i = 1,2) refers to the number of classes in tt^, respec- 
tively. Unfortunately, even on an acyclic program path p, a partition might be 
subjected to a number of meet operations of order fl(\p\). Together, a naive 
estimation yields an exponential growth of the number of classes of partitions 
(though no program is known exhibiting this behaviour). 
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In [16] we proposed a representation of structured 
partitions in terms of structured partition DAGs, in 
which common substructures are shared [6]. This 
dramatically reduces the space requirements as can 
be seen by the structured partition DAG to the right. 

It represents the partition TTg^p considered above. a,b c,d e,f 

The usage of structured partition DAGs eliminates the exponential blow-up in 
the representation of partition classes sketched above. However, the problem 
of the growth of the number of classes is still present. Though as mentioned 
above there is no example known exhibiting this exponential behaviour, one is 
still faced with a quite extensive data structure where every program point is 
annotated with a possibly large structured partition DAG. We suspect that this 
is the main obstacle for the widespread usage of Kildall-like techniques. 

4 Alpern, Wegman and Zadeck’s Algorithm at a Glance 

Similar to the previous section we focus on the essential steps and ideas un- 
derlying Alpern, Wegman and Zadeck’s algorithm, or for short AWZ-algorithm. 
We illustrate its essence on an informal and intuitive level. For this purpose we 
consider the example of Figure 3(a) which is a slight variant of Figure 2(a).® 
As mentioned in Section 1, the AWZ-algorithm works on flow graphs in static 
single assignment (SSA) form [5]. In essence, this means that the variables of 
the original program are replaced by new versions such that every variable 
has a unique initialization point. At merge points of the control flow pseudo- 
assignments Xk '■= (j>n{xii,... ,Xik) are introduced meaning that Xk gets the 
value of Xij if the join node is entered via the jth ingoing edge.^ The SSA form 
of our running example is depicted in Figure 3(b). 





Fig. 3. Illustrating the AWZ-algorithm: (a) the original program and (b) the program 
transformed into SSA form. 

Based on the SSA form of a program the value graph is constructed. It repre- 
sents the value transfer of SSA variables along the control flow of the program. 

® Actually, the AWZ-algorithm fails on the example of Figure 2. 

^ (^operators are indexed by their corresponding join node. 
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Following the description in [10] the value graph is defined as a labelled directed 
graph where 

— the nodes correspond to occurrences of nontrivial assignments, i. e. assign- 
ments whose right-hand side contain at least one operator, and to occur- 
rences of constants in the program. Every node is labelled with the corre- 
sponding constant or operator symbol. Additionally, every node is annotated 
by the set of variables whose value is generated by the corresponding con- 
stant or assignment. An operator node is always annotated with the left-hand 
side variable of its corresponding assignment. Moreover, for a trivial assign- 
ment X := y the generating assignment of x is defined as the generating 
assignment of y, and for a trivial assignment x := c the corresponding node 
associated with c is annotated with x. For convenience, the constant or op- 
erator label is drawn inside the circle visualizing the node, and the variable 
annotation outside. 

— Directed edges point to the operands of the right-hand side expression as- 
sociated with the node. Moreover, edges are labelled with natural numbers 
according to the position of operands.® 

Figure 4(a) shows the value graph corresponding to Figure 3(b). It is worth 
noting that the value graph is cyclic which is due to self-dependencies of variables 
in the loop. 





Fig. 4. (a) The value graph corresponding to Figure 3(b), and (b) the collapsed value 
graph of (a) after congruence partitioning. 

The central step of the AWZ-algorithm is a partitioning procedure determining 
congruent nodes in the value graph. Like Kildall’s algorithm the AWZ-algorithm 
proceeds optimistically computing a greatest fixed point. To this end, it starts 
with a coarse partition which is refined in the sequel. More precisely, the schedule 
of the algorithm is as follows: 

Start partition: all nodes of the value graph with identical constant or oper- 
ator label are grouped into the same class of the partition. 

® We omit this labelling in our examples making the implicit assumption that edges 
are ordered from left to right. 
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Refining a partition: two nodes n and to, which belong to the same class 
and are labelled with a /c— ary operator are separated into different classes if 
there is an i ^ A: such that the zth operands of both nodes belong to different 
classes. 

For our illustrating example the AWZ-algorithm comes up with the collapsed 
value graph depicted in Figure 4(b). 

4.1 Results 

As shown in [1] the AWZ-algorithm is sound. In terms of the original flow graph 
the result of Alpern, Wegman and Zadeck would read as follows: 

Theorem 2 (Soundness). Two variables are Herhrand- equivalent at a pro- 
gram point n if there is a node in the collapsed value graph, which is annotated 
by the current SSA-instances of both variables. 

Here, the notion of a current SSA-instance refers to the SSA-version of the 
variable, which immediately dominates the program point. Unfortunately, the 
“only-if” direction does not hold. This means that the AWZ-algorithm is not 
complete (cf. Section 4.2). However, it can very efficiently be implemented by 
means of a technique, which resembles Hopcroft’s algorithm for the minimization 
of finite automata [7]. For a value graph with e edges the AWZ-algorithm termi- 
nates within 0{e log(e)) steps. In contrast to Kildall-like approaches, which we 
discussed in Section 3, it is pragmatically advantageous that the AWZ-algorithm 
relies on a single global data structure only, which uniformly captures both the 
control and the value flow. 

4.2 Limitations 

In this section we discuss limitations of the AWZ-algorithm and show that it is 
not complete. To this end we discuss typical situations, in which it fails to detect 
equalities of variables. Of course, according to Theorem 1 all these equalities are 
detected by Kildall’s algorithm. The main weakness of the AWZ-algorithm is a 
consequence of treating (()-operators like ordinary operators. This way, part of 
the control flow is not fully interpreted, but treated in a “syntactical” way. 

We elucidate this by means of the example in Figure 1(a). In this example 
the AWZ-algorithm fails to detect the Herbrand-equivalence of x and y at the 
exit of node 4. Figure 5(a) shows the SSA form of this program and Figure 5(b) 
the collapsed value graph after congruence partitioning. 

The reason of this failure, i.e., the failure of detecting the equality of X 2 and 
7/0, is that the partitioning process treats ^operators like ordinary operators. 
Hence, even in the start partition an expression with, let’s say, top-level oper- 
ator “-b” is separated from one with top-level operator f>. In other words, the 
AWZ-algorithm is highly sensitive to the position of ^operators in composite 
expressions. In Section 5 we will present a normalizing transformation remedying 
this drawback, which is the key to our approach. 
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a) b) 





Fig. 5. Concerning Figure 1(a): the value equality of x and y at the exit of node 
4 is not detected by the AWZ-algorithm. (a) The program in SSA form and (b) the 
collapsed value graph after congruence partitioning. 

Next, we consider the looping example of Figure 2(a). Also in this example 
the AWZ-algorithm fails to detect the equality of x and y at the exit of node 2. 

Again the reason lies in the distinct positions of (/(-operators, however, now 
in a cyclic context of the value graph. Figure 6 shows the program in SSA form 
and the corresponding collapsed value graph. 





Fig. 6. Concerning Figure 2(a): the AWZ-algorithm fails to detect the value equiv- 
alence of X and y at the exit of node 4. (a) The program in SSA form and (b) the 
collapsed value graph after congruence partitioning. 



5 The AWZ- Algorithm with Integrated Normalization 

In this section we will present our algorithm, which is an extension of the AWZ- 
algorithm. It works by modifying collapsed value graphs according to a set of 
graph rewrite rules. In order to give a precise formulation we introduce some 
notations and conventions. 

For a node n of a collapsed value graph we denote the set of variables an- 
notating n by vars{n), its immediate successor nodes by succ{n) and the set of 
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indirect successors by succ*{n).^ A node n with vars{n) = 0 is called an anony- 
mous node. For the sake of presentation we assume that (/)-operators are always 
binary. Note that this can always be achieved by a “linearization” of join nodes, 
which have more than two incoming edges. The construction of this section, 
however, is not tied to this assumption and can easily be extended to capture 
fc-ary (/)-operators as well. 

5.1 The Normalization Rules 

The normalization process is driven by the two graph rewrite rules depicted in 
Figure 7. In these rules the left-hand side of the large arrow denotes a pattern 
occurring in the collapsed value graph, which is replaced by the graph pattern 
on the right-hand side. Incoming and outgoing edges of nodes in the argument 
pattern which are not part of the pattern are not touched by the applications 
of the rules. It should be noted that separate nodes of the pattern may match 
the same node in the collapsed value graph. Operator labels in the pattern are 
matched according to the following convention: labels oj in the pattern match 
with 0-operators and ordinary operators, respectively. Unlabelled nodes of the 
pattern match any node in the collapsed value graph. 

Rule (1) is already mentioned in [1], however, only as a one-step postprocess 
for simplifying the value graph after termination of the congruence partitioning. 
In our approach Rule (1) is an integral part of the iteration process. It eliminates 
unnecessary 0-operators which can either occur as the result of the partitioning 
process or of applications of Rule (2). Rule (1) is applicable whenever a node 
n with 0-operator is present whose operands refer to the same node. In this 
case any edge pointing to n is redirected to m, the variable annotations of n are 
added to m, and finally, node n is eliminated. 

Rule (2) is a new normalization rule. Essentially, it can be regarded as a 
directed distributivity rule. “Expressions” with 0-operators are rewritten to have 
0-operators innermost whenever this is possible. More laxly, this rule reads as: 

0m(^ to C to dj 0m(a, cj to 0ixi (b,d) 

Rule (2) is applicable, if there is a node n with a 0-operator whose both operands 
have the same ordinary operator label, say to, at top-level. Moreover, n must not 
be strictly followed by an anonymous node. The pattern is then modified as 
displayed on the right side of the arrow: 

— Two new nodes labelled with the 0-operator of n are introduced and con- 
nected with the operands of I and r as depicted in Figure 7. 

— Node n gets to as its operator label. 

— Finally, the outgoing edges of n are redirected to the new nodes. 

Proposing a rule system directly raises questions on termination and conflu- 
ence of the rewrite process where we consider congruence partitioning a graph 
rewriting step, too. Fortunately, both properties are satisfied. 

® Formally, succ*{n) is the smallest set with succ{n) C succ*{n) and Vm € 
succ*{n). succ{m) C succ*{n). 
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vars(m) := vars(m)'^ vars(n) 




Fig. 7. Graph rewrite rules for normalizing collapsed value graphs. 



Lemma 1 (Confluence and Termination). The graph rewrite system con- 
sisting of the rules in Figure 7 together with congruence partitioning is termi- 
nating and confluent. 

Proof (Sketch): Termination is proved in the complexity section of Section 
13, where the complexity of the algorithm is estimated. Thus, we concentrate 
here on the proof of confluence. According to Newman’s Theorem [11] it suf- 
fices to prove local confluence. Obviously, congruence partitioning preserves the 
potential of applications of Rule (1) and Rule (2). A rule which can be applied 
before collapsing can also be applied after collapsing. To gain local confluence 
the rule has then further to be applied on all parts which are merged into the 
common structure. Moreover, it is easy to see that two possible applications of 
either Rule (1) or Rule (2) can be performed in any order. Thus, the only in- 
teresting case where two rules may overlap is a conflict between applications of 
Rule (1) and Rule (2). The diagram resulting from this situation together with 
the way of how it can be completed is shown Figure 8. □ 

5.2 The Iteration Strategy 

The rules heavily interact which each other. For instance. Rule (1) may eliminate 
a (/)-node above an operator node. This enables Rule (2). Vice versa Rule (2) may 
enable Rule (1) as already seen in Figure 8. In addition. Rule (1) and Rule (2) 
may open further opportunities for the partitioning algorithm, and vice versa it 
may trigger further rule applications. 

In order to fully capture the interdependencies we thus propose the following 
schedule of the application order: 
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Fig. 8. Local confluence of the rewrite rules. 



BEGIN 

Start with the value graph as collapsed value graph. 

REPEAT 

1. Perform the partitioning step of the AWZ-algorithm on the 
collapsed value graph. 

2. Apply Rule (1) and Rule (2) exhaustively.^*^ 

UNTIL the collapsed value graph is stable 

END 



The power of these rules are demonstrated by the examples of Section 4.2 
which are out of the scope of the classical AWZ-algorithm. Starting with the 
collapsed value graph of Figure 5(b), application of Rule (2) followed by Rule 
(1) results in the value graph of Figure 9(a). A successive partitioning step leads 
to the value graph of Figure 9(b), where the equality of x and y is revealed as 
desired. 

Also for the cyclic situation of Figure 6(b) our approach succeeds. Figure 
10(a) depicts the value graph after an application of Rule (2) and Rule (1). 
Starting with this situation the partitioning algorithm detects the equality of Xi 
and j /2 as shown in Figure 10(b). 

5.3 Results 

Together with the soundness of the congruence partitioning, which has been 
proved in [1], and the obvious soundness of Rule (1) and Rule (2) we have: 

After application of Rule (2) the created <j!)-nodes have to be immediately checked 
for applicability of Rule (1). 
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Fig. 9. The algorithm succeeds on the example of Figure 5. The collapsed value graph 
after (a) the application of normalization rules and (b) after a further partitioning step. 





Fig. 10. The algorithm succeeds on the example of Figure 6. The collapsed value graph 
after (a) the application of normalization rules and (b) after a further partitioning step. 

Theorem 3 (Soundness). Two variables are Herhrand- equivalent at a pro- 
gram point n if there is a node in the collapsed value graph which is annotated 
by the current SSA-instances of both variables. 

For acyclic programs our algorithm is even complete, i.e., it detects all 
Herbrand-equalities. We have: 

Theorem 4 (Completeness (Acyclic Case)). In an acyclic program two 
program variables are Herbrand- equivalent at a program point n if and only if 
there is a node in the collapsed value graph which is annotated by the current 
SSA-instances of both variables. 

Proof (Sketch): The if-direction holds because of Theorem 3. The only-if direc- 
tion, and hence completeness, can be shown by an induction on the structure of 
collapsed value graphs (DAGs in this case). 

Let us consider two nodes n and m of a collapsed value graph^^ denoting 
Herbrand-equivalent terms such that n and m are not strictly followed by an 
anonymous node. Then we are going to show by induction on the sum of the 
depths of m and n that the two nodes are collapsed into a single one by our 
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Possibly at some arbitrary intermediate stage of the transformation. 
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algorithm. The induction base where n and m are labelled with constants is 
trivial. Obviously, Herbrand-equivalence forces that n and m are labelled with 
the same constant and thus are collapsed by the congruence partitioning step. 

For the induction step we may assume that n and m refer to Herbrand- 
equivalent terms op„(ui,U 2 ) and and oprn{vi,V 2 )-"^^ Here we have to distinguish 
three different situations: 

Case 1: If op„ and opm are ordinary operators the definition of Herbrand- 
equivalence implies that opn = opm, say w, and rti is Herbrand-equi valent 
with v\ and U 2 with V 2 - By the induction hypothesis the nodes belonging to 
Ui and Vi and to U 2 and V 2 ^ respectively, are collapsed. Hence n and m are 
finally collapsed by congruence partitioning. 

Case 2: If op„ and opm are both (/^operators the situation is analogously to 
Case 1, if they are identical. Otherwise, let opn = 4>r and opm = 4>s- Without 
loss of generality we may assume that r strictly dominates s.^^ Then both 
v\ and V 2 are also Herbrand-equivalent with </)r(wi, « 2 )- According to the 
induction hypothesis m’s immediate successors can be merged into a single 
node m! . Thus m can be eliminated by applying Rule (1) which makes the 
induction hypothesis applicable to n and in' . 

Case 3: If opn is an ordinary operator oj and opm = 4>r, then we may assume 
that r strictly dominates the definition site of oj{ui,U 2 ) (otherwise the (j)- 
node can be eliminated with the same reasoning as in Case 2). Moreover, 
one may assume that lu(ui,U2) is immediately dominated by r as the back- 
ward substitution^^ Sp{uj{ui,U 2 )) along an arbitrary path between r and 
the definition site of w(mi,M 2 ) is also charaterized by n. In order to show 
that both nodes belonging to v\ and V 2 are labelled by co one may assume 
that i5p(w(t6i, M 2 )) is further backward substituted along the ingoing braches 
of r. Virtually extending the value graph such that these expressions are 
contained, the induction hypothesis becomes applicable yielding the desired 
labelling of v\ and V 2 - This makes Rule (2) applicable which turns m into 
a node labelled with uj, too. Moreover, another application of the induction 
hypothesis yields that the newly introduced (^operators are collapsed with 
Ml and M 2 which guarantees that no anonymous (/(-operators are introduced. 
Thus this case finally reduces to a situation where the reasoning of Case 1 
becomes applicable. □ 



The depth d(n) of a node n is 0, if n is a leaf node, and otherwise inductively defined 
by max{d{l{n)),d{r{n))) + 1, where l{n) and r(n) refer to the left and right child 
node of n. 

Note that these expressions are contained in the collapsed value graph under con- 
sideration. 

Note that by our assumption that n and m are Herbrand-equivalent at a certain 
program point r must strictly dominate s or vice versa. 

The backward substitution has to be slightly modified in order to take (/(-operators 
into account. 
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Complexity By construction, (/)-nodes added by Rule (2) only point to nodes 
labelled by some variable. Thus the number of newly generated nodes is bound by 
n^, where n denotes the number of nodes in the original value graph. As successful 
applications of Rule (1) and congruence partitioning steps are guaranteed to 
delete nodes, the total number of successful rule applications is of order O(n^). 
With the fact that congruence partitioning is the most expensive of the rewrite 
steps this amounts to an overall complexity of order 0(n‘* /og(n)). It should 
be noted that this is an extreme worst-case scenario. In practice, we rather 
expect the collapsed value graph to be linear in size of the original value graph 
which would reduce the computational complexity to a reasonable bound of 
0(n^ login)). 



6 Conclusions 



Detecting the equality of variables (and based thereon those of expressions) is a 
prerequisite of a large variety of program optimizations ranging from partial re- 
dundancy elimination over common subexpression elimination to constant prop- 
agation. Since the general problem is undecidable, it is usually considered with 
respect to the Herbrand interpretation. With respect to this interpretation, flow 
graphs, the representation of programs most commonly used in optimization, 
represent value equivalences and the value flow locally, i. e. statementwise: the 
left-hand-side variable of an assignment equals the value of its right-hand-side 
expression. Value flow graphs (cf. [16]) represent the opposite pole: value equiv- 
alences of variables and expressions and the value flow are represented globally, 
i. e. across the complete program. In this scenario value graphs (cf. [1]) stand 
between flow graphs and value flow graphs with respect to performance and 
precision: the equivalence and value flow information represented can efficiently 
be computed, however, at the price of losing precision. In this article we showed 
how to enhance the value-graph approach in order to arrive at an algorithm 
which for acyclic control flow fairly combines the efficiency of the value-graph 
approach with the precision of the value-flow-graph approach. The resulting al- 
gorithm is optimal for acyclic programs, i. e. it detects all value equivalences with 
respect to the Herbrand interpretation. We are currently exploring an extension 
to arbitrary control flow. In particular, we are investigating an adaption of the 
presented strategy, in which Rule (1) and the congruence partitioning process 
are merged, and Rule (2) (together with Rule (1)) is exhaustively exploited in 
a preprocess stage. To the best of our knowledge this would provide the first 
algorithm for the detection of Herbrand equivalences with proven polynomial 
time complexity. 

In addition to the theoretical perception of what is the essence of value 
equivalence detection, we expect that our approach has an important impact in 
practice as the considerably weaker basic value-graph approach of [1] is widely 
used in practice. 




Detecting Equalities of Variables: Combining Efficiency with Precision 247 



References 

1. B. Alpern, M. Wegman, and F. K. Zadeck. Detecting equality of variables in 
programs. In Conf. Record of the ACM Symposium on the Principles of 
Programming Languages {POPP), January 1988. 

2. P. Briggs, K. D. Cooper, and L. T. Simpson. Value numbering. Software- Practice 
and Experience, 27(6):701-724, June 1997. 

3. C. Click. Global code motion/global value numbering. In Proc. ACM SIGPLAN 
Conference on Programming Language Design and Implementation {PLDI), vol- 
ume 30,& of ACM SIGPLAN Notices, pages 246-257, La Jolla, CA, June 1995. 

4. J. Cocke and J. T. Schwartz. Programming languages and their compilers. Courant 
Institute of Mathematical Sciences, NY, 1970. 

5. R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and F. K. Zadeck. Efficiently 
computing static single assignment form and the control dependency graph. AGM 
Transactions on Programming Languages and Systems, 13(4):451 - 490, 1991. 

6. A. Fong, J. B. Kam, and J. D. Ullman. Application of lattice algebra to loop 
optimization. In Gonf. Record of the 2"“^ AGM Symposium on the Principles of 
Programming Languages {POPP), pages 1-9, Palo Alto, CA, 1975. 

7. J. Hopcroft. An n log n algorithm for minimizing the states of a finite automaton. 
The Theory of Machines an Gomputations, pages 189 - 169, 1971. 

8. G. A. Kildall. A unified approach to global program optimization. In Gonf. Record 
of the AGM Symposium on the Principles of Programming Languages {POPP), 
pages 194 - 206, Boston, MA, 1973. 

9. J. Knoop, O. Riithing, and B. Steffen. Code motion and code placement: Just 
synonyms? In Proc. European Symposium on Programming {ESOP), Lec- 
ture Notes in Computer Science 1381, pages 154 - 196, Lisbon, Portugal, 1998. 
Springer- Verlag. 

10. S. S. Muchnick. Advanced Gompiler Design & Implementation. Morgan Kaufmann, 
San Francisco, CA, 1997. 

11. M. H. A. Newman. On theories with a combinatorial definition of equivalence. 
Annals of Math., 43,2:223-243, 1942. 

12. J. H. Reif and R. Lewis. Symbolic evaluation and the gobal value graph. In Gonf. 
Record of the 4^^ AGM Symposium on the Principles of Programming Languages 
{POPP), pages 104 - 118, Los Angeles, CA, 1977. 

13. B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global value numbers and redun- 
dant computations. In Gonf. Record of the 15^^ AGM Symposium on the Principles 
of Programming Languages {POPP), pages 12 - 27, San Diego, CA, 1988. 

14. L. T. Simpson. Value-driven redundancy elimination. Technical Report TR98-308, 
Rice University, April 6, 1998. 

15. B. Steffen. Optimal run time optimization. Proved by a new look at abstract 
interpretations. In Proc. International Joint Conference on the Theory and 
Practice of Software Development {TAPSOPT), Lecture Notes in Computer Sci- 
ence 249, pages 52 - 68, Pisa, Italy, 1987. Springer- Verlag. 

16. B. Steffen, J. Knoop, and O. Riithing. The value flow graph: A program represen- 
tation for optimal program transformations. In Proc. 3 European Symposium on 
Programming {ESOP), Lecture Notes in Computer Science 432, pages 389 - 405, 
Copenhagen, Denmark, 1990. Springer- Verlag. 




A New Class of Functions for Abstract 
Interpretation 



Jorg Roller and Markus Mohnen 

Lehrstuhl flir Informatik II, RWTH Aachen, Germany 
{koeller , mohnen} @inf ormat ik . rwth-aachen . de 



Abstract. In the context of denotational style abstract interpretation 
it is crucial to have an efficient fixed point solver. In general, however, 
finding a fixed point requires exponential time. One approach to improv- 
ing the efficiency is the use of special classes of functions. A well-known 
example for such a class are additive functions, which allow the reduction 
to quadratic runtime. 

In this paper, we demonstrate that additive functions are not suited in a 
higher-order context. To overcome this deficiency, we introduce the class 
of component-wise additive functions, which are an extension of the class 
of additive functions. 

Component-wise additive functions allow us to solve many higher-order 
equation systems in polynomial time. We stress the usefulness of our 
class by presenting a package for implementing abstract interpretations 
using our theoretical results. Furthermore, experimental results taken in 
a case study for escape analysis are presented to relate our approach to 
other approaches. 



1 Introduction 

Abstract interpretation [CC77] is a powerful technique in the field of compile- 
time program analysis. For the approximation of program-properties the solution 
of an equation system is computed by a fixed point iteration. Because finding this 
solution is an expensive operation, efficient implementations of fixed point solvers 
are needed. This is especially crucial in the presence of higher-order functions, 
because the data structures representing these functions become far too large 
to use a naive approach: Exponential complexity in the first-order case and a 
double exponential complexity or worse in the higher-order case. Although there 
are more sophisticated data structures like BDD ’s [Bry86,BBR90] and Frontiers 
[MH87] which work well in a first-order framework, these approaches do not help 
in the higher-order cases. 

Our approach focuses on restricting the class of functions allowed in the 
fixed point iteration. Hence we can exploit special properties of the restricted 
classes for an efficient implementation. In [NN92] the use of additive functions 
was studied with the result of a quadratic complexity for finding a fixed point: 
The fixed point is reached faster than in the general framework and the functions 
involved can be represented more efficiently. In principle, these results also hold 
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for additive higher-order functions. However, the most fundamental operation 
in the higher-order context is not additive: The application of a function to 
an argument. Therefore, additive functions are not really useful for abstract 
interpretation of modern programming languages. 

We introduce a new class of functions, called component-wise additive func- 
tions, extending the class of additive functions. This class still has some proper- 
ties yielding a polynomial complexity for the computation of fixed point even in 
the presence of higher-order functions. Hence we get a class of functions allowing 
us to solve higher-order equation systems efficiently. In addition, this class is also 
useful in a purely first-order context, since it closes the fairly large gap between 
monotone and additive functions. 

Based on these ideas we developed a package that can be used to create 
and solve higher-order equation systems. We implemented different represen- 
tations of functions like HDD’s, so we can handle all monotone functions. In 
addition, we implemented a specialisation for component-wise additive func- 
tions. It exploits the restriction on component-wise additive functions by us- 
ing several optimisations yielding a polynomial runtime and memory usage for 
all compont-wise equation systems, including the higher-order case. The pack- 
age includes a description language for the implementation of a large class of 
abstract interpretations. We used this package to implement escape analysis 
[Moh95b,Moh95a,Moh97] . Experimental results using this analysis demonstrate 
the usefulness of component-wise additive functions. 

The content of this paper is structured as follows: In the next section we 
briefly review the main results of [NN92]. Furthermore, we show why pure addi- 
tive functions are not suitable in the context of higher-order analysis. Based on 
these observations, we introduce the class of component-wise additive functions 
in Section 3. In Section 4 we describe the package for the implementation of ab- 
stract interpretations. This includes the definition of a higher-order functional 
language for recursive equation systems suitable for many abstract interpreta- 
tions. Experimental results taken in a case study for escape analysis are presented 
in Section 5. The paper concludes with a summary and prospects to future work. 



2 The General, Monotone, and Additive Frameworks 

One approach to denotational style abstract interpretations approximates pro- 
gram properties by the least fixed point of a monotone functional . D ^ D. 
Here, (D, <) is a finite complete lattice of abstract values. The well-known fixed 
point theorem of Tarski guarantees that there exists a fc € N such that 

lfp(<I>) = <I>'=(T) 

Hence, the resulting algorithm for computing Ifp(^) in this setting is to compute 
^(T), ^^(T), . . . until two succeeding elements are equal. Obviously, the costs 
of this algorithm strongly depend on the number k of iterations necessary to 
reach this point of stability. 
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First-order abstract interpretations can be modelled in this setting by as- 
suming that the lattice D is a subset of the lattice of functions of type 

AP 

where p, q € N, A and B are finite complete lattices, and < is the usual argument- 
wise order. In [NN92], bounds for k were presented for this class of abstract 
interpretations. The crucial point is to find a good representation for D and how 
additional properties can be used to improve this representation. In addition, 
the representation forms the basis for implementations of these algorithms. 

In the next two subsections, we summarise the main results of [NN92], fol- 
lowing their notation: Given a finite complete lattice (L, <) we write 

— c(L) for the number of elements of L (cardinality of L) 

— h(i) for the maximal length of chains in L (height of L) 

Since is monotone, the elements ^(-L), <?^(_L), . . . form a chain and we have 

k < h{D) 

Depending on which subset of A^ we choose as the lattice D, we can 

consider different frameworks. 



2.1 General and Monotone Ftamework 

We assume that D is the set of all functions of type A^ B‘^, which we write 
as D = AP B‘^. The obvious representation for a function / is to store the 
graph of /, i.e. as table of c(A)^’ entries. For each entry, at most h(S*) = q-h{B) 
ascending elements exists. In a chain of functions at least one entry must increase 
in each step and we have 

h{AP B^) = c{A)P ■ q ■ h{B) 

Hence, finding a fixed point has exponential runtime. Neither the bound nor the 
representation changes if we restrict D to monotone functions, Ap -^rn B'^. 

2.2 Additive Ftamework 

In this framework, we restrict D to those functions / : Ap — > B‘^ , which are strict 
and additive, i.e. /(T) =T and /(oi Ua 2 ) = /(oi) U/(a 2 ) for all oi, 02 G Ap. We 
denote this framework with D = Ap ^sa B^. Note that every additive function 
is monotone. 

Examples for real-life abstract interpretations featuring additive functions 
are first-order escape analysis [Moh95b] and liveness analysis [NN89]. Strictness 
analysis [BHA86], however, is not additive, since branches are approximated by 
functions of type /(a, 6, c) = a □ (6 U c). 

The important property here is additivity; If we can find {oi, . . . , a„} C Ap 
such that every a G Ap can be represented as a = U . . . U for ii, . . . ,ij C 
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{1, . . . , n}, then functions f & D can be represented in a better way: Only values 
/(oi), . . . , f{an) are stored in a table. Since we have 

/(a) = /(oii U . . . U a* J = f{ai ^ ) U . . . U f{ai^ ) 

this is sufficient as a complete representation. 

Fortunately, every finite complete lattice contains such a subset, consisting 
of the join-irreducihle elements: a is join-irreducible iff a = oi U 02 implies that 
a = oi or a = 02 . ft is trivial that _L is join-irreducible. We write l(L) for set 
non-bottom join-irreducibles of a finite complete lattice {L, <), I_l(A) for all join- 
irreducibles, and i{L) for the number of elements in l(L). It is well known [DP90] 
that for every element a G L exists I C l(L) such that o = |J /. Furthermore, 
we can define a unique representation of a by choosing an I with oi, 02 G I and 
oi yf 02 implies that oi ^ 02 . We denote this with l(o). Obviously, o = o' iff 
1 ( 0 ) = l(o') and o irreducible iff l(o) = {o}. 

The degree of improvement is determined by the ratio of i{L) and c(L). For 
the important class of finite distributive complete lattices, we have i(L) = h(T). 
But more important in our setting is the relation between l(A^) and l(A): A join- 
irreducible vector only consists of p — 1 bottom elements and one non-bottom 
irreducible element from A. Hence, we have i{AP) = p ■ i(A), not depending on 
properties of A. 

The resulting representation of a strict and additive function / stores the 
table consisting of /(l(H^)). Again, in a chain of functions at least one entry 
must increase in each step and we have 

h{AP B‘^)=P- i(A) • q ■ h{B) 

Here, finding a fixed point can be done with a quadratic number of iterations. 
If we omit the strictness, then we have to store /(T) as well. For this case, we 
define 1±{L) := l(L) U {T}. 



2.3 The Higher-Order Case 

Nielson and Nielson say that “our results . . . are not limited to a first-order 
framework of program analyses” . They argue that nothing was assumed about 
the structure of A and B; For instance, A can be a lattice of functions. Of 
course, they are absolutely right about this: The results for general, monotone, 
and additive framework remain true. 

Unfortunately, virtually no abstract interpretation will benefit from this re- 
sult, because one of the most fundamental operation in the higher-order setting is 
not additive: The function apply : (A ->-t B) x A ->-t B which applies a function 
to an argument. Consider the lattice of booleans B with 0 < 1 as base lattice: 
The irreducibles are l(B) = {1} and l(B ^ B) = {id,^} and hence id = idU T 
and 1 = 1 U 0. Therefore we have 



apply(id, 1) = 1 yf 0 = apply(id, 0) V apply(T, 1) 
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This example also shows that it is not sufficient merely to restrict the first 
argument to additive functions, since id is already additive. 

An additional obstacle becomes obvious, when we look at another fundamen- 
tal operation, at least in the context of functional languages: For the function 
papply : (B X B — > B) X B — > (B ^ B) which partially applies a binary function 
to an argument we have papply(V, 1) = 1, where 1 is the constant function. Ob- 
viously, 1 is not strict. We can easily fix this problem by omitting the strictness 
condition and move to the framework D = —^a The table representing 

an function / is extended with /(T). We have: 

h(AP = p • (i(A) + 1) • g • h(S) 

However, this table entry is only used for computing /(T); we do not need to 
include it in the computation of other values, because every additive function is 
monotone and hence /(T) < /(a) for all a G A^, especially for all table entries. 

3 The Component-Wise Additive Framework 

In this section, we use the observations we made for additive functions in the 
higher-order case to define a new framework: We now consider D to be the set 
of component-wise additive functions and we write D = A^ ^ca 

Definition 1. f ■. A^ ^ B^ is component-wise additive iff for all ai, ... ,Op € 
A, I < i < p, and a[ G A holds: 

y*(ui , . . . , , Uj LI Uj , , . . . , Op) cti-i-i , • . • , Up) 

1-iy (^1 5 ■ ■ • 7 O.i—1 , (Zj , , . . . , Up) 

Obviously, every additive function is component-wise additive but not vice 
versa: For instance, A : B x B — > B is not additive (lAl (lAO)V (OAl)) but 
component-wise additive since 

1 A 1 = (1 V 0) A 1 = (1 A 1) V (0 A 1) 

= 1 A (IVO) = (1 A 1) V (1 AO) 

Moreover, every component-wise additive function is monotone and hence: 

A^ D AP D AP B‘> 

Using this property, a function / can be represented by the table consisting of 
/(ix(^), • ■ • ,Ix(^))> be. each entry holds the functions value for a tuple of join- 
irreducibles. Computing /(oi, . . . , Op) can be done by decomposing each into 
join-irreducibles (including T) and joining the table entries for all combinations. 

In contrast to the additive case, we have no exponential reduction here. The 
improvement is only determined by the ratio i(A) and c(A): 

Theorem 1. 

h(AJ> = (i(A) + 1)J> • g • h(B) 
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In the worst case, we have i(A) + 1 = c(A) and there is no improvement at all. 
However, as already mentioned earlier, we often have i(A) <C c(A): For instance, 
choosing A = B” gives a reduction from c(i?") = 2" to i(H”) = n and hence this 
approach still reduces the maximal number of iterations by a factor of (^) . 

3.1 The Higher-Order Case 

Unfortunately, the unrestricted function apply : {A ^ B) x A ^ B is not 
component-wise additive. For instance, for A = B = M we have: 

0 = apply(->, 1) = apply(-i, 1 V 0) yf apply(->, 1) V apply(^, 0) = 0 V 1 = 1 

But if we restrict the functional argument to (component-wise) additive func- 
tions, i.e. we consider apply : (A —>^a B) x A B, then we have 

apply(/i U / 2 , ai U 02 ) = (/i U / 2 )(ai U Q 2 ) = fi{ai U 02 ) U / 2 (oi U 02 ) 

= /i(ai) LI /i(a2) U /2(oi) U /2(o2) (1) 

and hence this function is component- wise additive. 

For real-life abstract interpretations the additional restriction to component- 
wise additive functions on all levels is no real problem: Typically, all functions are 
built in a compositional way. If we start from additive base functions and ensure 
that all constructions preserve component-wise additivity, then everything works 
beautifully. 

In higher-order functional languages like Haskell [HPW92] n-ary functions are 
often^ replaced by their curried counterpart: Given a function f : Ax B —> C its 
curried version /c : A — > (H — > C) is defined by /c(a) = 6 i-^- c iff /(a, b) = c. It 
is easy to see how this concept fits into our considerations: / is component- wise 
additive iff fc is additive and /c(a) is additive for all a G A. 

Theorem 2. 

AxB^^aC CX A^aiB^a C) 



3.2 Improving the Representation 

If we would follow the approach presented earlier, the function apply had to be 
represented by the table apply (l(A 5), 1(A)). However, the set of irreducible 
additive functions l(A B) is not a simple set: We do not know a method 
for construction, except for explicit test of all functions. However, the fixed 
point solver must obviously be able to construct the irreducibles in order to fill 
the tables. Hence, choosing this representation would cause a severe efficiency 
penalty. 

In contrast, l(A — > B) can be characterised very easily: / is join-irreducible 
iff / =_L[a/6] and b is join-irreducible in B. 

^ Clean [BvELP87] is an example for a functional language where this is not the case. 
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Theorem 3. 

/ € 1{A — > B) 4 =^ 3a € A : f{a) G l(B) and Wa ^ a : f{a) =_L 

Proof. 4 = is trivial. Assume that there exists an / e l(A — > B) which fails to 
fulfill the right side. Since / ^_L there are two cases to consider: The first case is 
that there exists a,a' G A with /(a) and /(a') ff-L. We define fi = f[a/ _L] 
and /2 = f[a'/ _L]. The second case is that there exists aG A such that /(a') =_L 
for all a' a but /(a) ^ l{B), i.e. /(a) = 6i U &2 with b\ ^ /(a) and &2 /(a). 

We define fi = f[a/bi] and /2 = /[a/62]- In both cases, / = /i U /2, / /i, and 

f f 2 which contradicts to the assumption that / G l(A — > B). 

For example, Figure 1 shows the set B x B ^ B, where each function / is 
represented by the inverse image i.e. the set of tuples which are mapped 

to 1 . The join-irreducibles l(B x B ^ B) are set in frames. 




Fig. 1. B X B ^ B 

Since we have such a simple characterisation of the join-irreducibles in A — > 
B, we would prefer to use l(A — > B) instead of l(A ^ca B) or l(A B). Recall 
from the last section, that the only requirement which lead to join-irreducibles 
was that every element was representable as a finite join. Obviously, this is also 
true for l(A ^ B)\ Every (component- wise) additive function is representable 
in terms of l(A — > B). Hence, we generalise the approach of the last section in 
the following way: 

Definition 2. Let L, M be lattices with L C M and a b = a Um b for all 
a,b G L. A set S C M is called a join-skeleton for L iff for every a G L exists 
Sa Q S such that a = [J^a. We call Sa the S'-representation of a. 

Obviously, l(L) is a join-skeleton for L and therefore this is an extension 
of the approach described so far. Furthermore, l(A — > R) is a join-skeleton for 
A —^ca B and A B. 

Note that L does not need to be a sub-lattice of M, which would also require 
that a FIl 6 = a FIm b for all a,b G L. In fact, A B is not a sub-lattice of 
A^ B. 
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To represent a function / : L — > L' in terms of a join-skeleton S' C M it is 
obviously necessary to go beyond the domain of the original function. 

Definition 3. Let S he a join-skeleton for L. A function f : L ^ L' is S- 
additive iff there exists fs'S^L' such that for all a G L with S -representation 
Sa holds: /(a) = fiUSa) = Ua'65. fs(a'). 

Again, it is trivial to see that this notion generalises the previous ones: A 
function / : A — > S is additive iff it is l(A)-additive and a function g : Ax B ^ C 
is component-wise additive iff it is l(A) x l(i?)-additive. 

For the task at hand, we can use this approach to avoid representing the 
function apply : (A — B) x A ^ B in terms of apply(l(A -B),l(A)). 
Since every additive function from A ->-a B has an l(A — > B)-representation, 
Equation 1 shows that apply is l(A — > B) x l(A)-additive. 



3.3 Further Improvements 

Besides the reduction of the number of argument combinations to store for the 
representation of a function, there are other points which can help to speed up 
the fixed point computation. The idea is to store an object as the list of the 
irreducible elements it can be decomposed into. Hence an empty list denotes T. 
This representation has several advantages: To compute an application we do 
not have to decompose an element into its join irreducibles, which we would have 
to otherwise. The other advantage is that not all functions need the same space 
to be stored. Simple functions, i.e. functions consisting of few irreducibles, need 
less memory. Our tests have shown that this has a major influence on memory 
usage. 

4 A Package for Abstract Interpretation 

In this section, we describe a language-independent package for implementing a 
large class of abstract interpretations based on (nested) bit vectors. 



4.1 A Language for Abstract Interpretation 

Let Tup(B) and Tho(B) be the extensions of B to arbitrarily nested tuples and 
functional types. We define the signature := ConstU LubUTwpU Proj where 

Tup= {tupt^^,„^t^^t\yti, . . . ,tn G Tup(B),(ti,...,t„) = t} 

Proj = {projl^^^ I Vti, . . . , G Tup(B), 1 < z < n} 

Lub = {lubt^t^t\yt G Tup(B)} 

Note, that we have no if-then-else, because we intend to use fixed point semantics 
and therefore need monotone functions. Since we want additive base functions. 
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we do not include a (/Z6-operator. Let F and X be fixed Tho-sorted sets of function 
resp. argument variables. The syntax of expressions Exp is defined as follows. 

c Exp* (constants) 

/(ei, . . . , e„) G Exp^ if / G g. g Exp®* (basic ops) 

G Exp* (function vars) 

X^*) G Exp* (argument vars) 

(ci 62) G Exp*^ if Cl G Exp**^*^, 62 G Exp** (application) 

Now we can define a recursive equation system for sets F and X of variables as 
a sequence of equations Fi{x \, . . . , Xn) = rhsi with matching types. 

The definition of the semantics is straightforward. We first present the se- 
mantics for expressions extending this to the usual least fixed point semantics of 
equation systems. As a base for our abstract domain we assume complete lattices 
Ai = (Aj, <Ai), where Ai is the set of plain bit vectors of length i and <a^ are 
the usual component- wise orders. We extend these domains to arbitrarily nested 
tuples and functions spaces 

Let A denote the union of all Ai and their extensions. Let k G Envconst, 
X G Envx, and p G Envp be type correct mappings from X, Const resp. F to 
A. Then the semantics E of expressions is inductively defined as: 

5 ![c!](k, X, p) = k(c), ifc G Const 
£\[x\]{k,x,p) = x(x), if a; G A 
£'-W]{k,X,p) = P{f), if / G ■F’ 

£\[lubt^t^t{ei,e2y]{i^,X,p) = ■S![ei!](K, x, p) Lit £\[e2^]{K,x, p) 
£![fupti^...^t„^t(ei,. .. ,e„)!](/v,x,p) = {£\[ey]{n,x, p), ■ ■ . ,£\[eA]{n,x, p)) 
Fi![p™j(ti,..„t„)^q((ei, . . . ,6 „))!](k,x,p) = £\[ey]{K,x, P) 

£y{ei e2)!](K,p,x) = (Fi![ei!](K,x,p) ■S![e2!](K, x, d)) 

For an equation F)(xi, . . . , Xn) = rhsi the polynomial function V is defined as: 

Vl[Fil]{K,p,ai ,. . . ,a„) = £l[rhsil]{K, [xi/ai, . . .,x„/a„],p) 

For an equation system {Fi{xi, . . . ,x„) = rhsi \ 1 < i < p) the semantics is 
defined as the least fixed point of the equation system 

(/i, ■ • ■ , /p) = [F1//1, . . .,Fp/f,]), . . .,P\[Fy.]{n, [F1//1, . . . , Fp/fp])) 



4.2 Effects of Additive Base Functions 

We now define the skeletons which form the base of the implementation. Since 
the additive framework works well for first-order functions, we start with a class 
of additive first-order functions. 

Definition 4. Sfo := {/ G | / additive} 

Based upon this we can define the class used in our implementation. 
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Definition 5. 

Sho ■= {/ G I / G S'fo or 

f is component- wise additive and t ^ Tup(B) 

V((Ji , . . . , ctyi) G /(ai,..., Ufi) G *5’ho} 

The second part in the definition of S'ho states, that results of a component- 
wise additive function which are of functional type have to be component-wise 
additive themselves. We need this condition to ensure that all functions occurring 
during the calculation are component- wise additive. 

Note, that we chose additive first-order functions as base of S'ho instead of 
component- wise ones. Since we can do better optimisations for additive functions 
it makes sense to include them here. 

We now investigate the effect of the restriction on functions in the class Sho 
to the semantics of our language. For the first-order part, i.e. equation systems, 
which neither contain higher-order argument nor function variables, we can prove 
the following theorem by induction over the structure of Exp: 

Theorem 4. Let e G Exp*, t G Tup(B), and X = {xi, . . . ,x„}. If e contains 
neither higher-order arguments nor function variables, then: 

S![e!](K, [xi/ai U 6i, . . . ,x„/a„ U 6„],p) = S![e!](«;, [xi/ai , . . . ,x„/a„],p) U 

S![e!](/t, [xi/bi, . . .,Xn/bn],p) 

if with F = {Fi, . . . , Fm} holds Vf G {!,..., m} : Fi has a first-order type 
p{Fi) is additive. 

This theorem says that the solution of a first-order equation system will only 
contain additive functions. Hence, no non-additive functions will appear in the 
iteration. We can use this result for an efficient implementation as described 
in the previous sections. For higher-order systems we can prove the following 
theorem: 

Theorem 5. Let e G Exp*, F = {Fi , . . . , F^}, and X = {xi , . . . , x„}. Further- 
more, let p be such that p{Fi) e S'ho ond let for all Xi with functional type be 
ai,bi G Sho- In addition, we assume (*). Then 

S![e!](K, [xi/aiUbi , . . . ,a;„/a„U&„],p) = |_| S![e!](«:, [xi/yi, . . .,Xn/yn],p) 

and S![e!](«:, [x\/ai U 5i, . . . ,Xn!an U bn],p) G Sho if the type of e is functional. 

We present the condition (*) later in the proof and explain it there, because 
it only makes a recognisable sense in the context. Before we do the proof, we 
explain the meaning of the theorem. Similar to Theorem 4 this one implies, that 
all functions involved in the computation of the fixed point semantics belong to 
the class Sho. As precondition we only need to start we the appropriate bottom 
functions and have to ensure that condition (*) is met. Later we will introduce 
an easily decidable syntactic property which implies (*). 
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Proof. This proof will be done by induction over the structure of e. To increase 
readability, we use the symbol ]J as abbreviation for y„)en” i{“« 6»}’ 

omit all uninteresting cases and consider only e = (ci 62) : We have: 

5![e!](K, [xi/ai U &i, . . . ,cc„/a„ U 6„],p) = 

(£![ei!](K, [xi/aiUbi, . . . , x„/a„U6„], p) 5![e2!](«:, [xi/aiUbi,. . . , a;„/a„U&„], p)) 
By induction we get: 

= {Sl[eil]{K, [xi/ai U 5i, . . ..x^jan U &„],p) ]J£![e2!](K, [a:i/yi, . . . , x„/p„], p)) 
Again by induction £![ei!](«:, [xi/ai U 61, . . . , Xnjan U &„], p) € S'ho holds. Hence: 
= [J(f![ei!](K, [xi/ai U 5i, . . .,Xn!an U 6„],p) £![e2!](«:, [xi/yi , . . . , x„/p„], p)) 
By induction the first subexpression yields a component-wise additive function. 
Hence we get: 

U (( u £![ei !](«;, [xi/zi, . . . , p)) 

(yi.....!/n)en?=i{ai.fci} (2i.---.zn)en?=i{ai.fci} 

£![e2!](K, [a;i/yi, . . . , p)) 

u u 

(f![ei!](K, . . . ,x„/z„],p) £![e2!](K, [xi/yi , . . . , x„/p„], p)) 

We now have to assume the equality to the next line, because in general this 
does not hold. This is the condition (*) in Theorem 5. 

= U(^![ei!](K, [x\/yi,-- .,Xn/yn],p) S\[e2']{n, [xi/pi,. . . , x„/p„], p)) 

= [xi/yi,.. .,Xn/yn],p) 

Because by induction the semantics of e\ yield function in Aho, the result of 
the application is again in Aho , if the result is functional at all. The proof of 
Theorem 5 is now complete. 

In the implementation of our package we exploited both theorems by using the 
optimisations described in the previous sections for all additive first-order func- 
tions. For the higher-order part of our language Theorem 5 justifies the use of 
the optimisations for component-wise additive functions. Hence, we can repre- 
sent higher-order functions with polynomial space and therefore we can solve 
higher-order equation systems in polynomial time. We will now present a class 
of expressions that fulfils condition (*). Whether an expression belongs to this 
class or not, can be tested very easily given the proposition of the following 
theorem. 

Theorem 6. Condition (*) holds for an expression e of the form (ei 62), if the 
sets of argument variables of ei and 62 are disjoint. 

Therefore we only have to decide, if all applications in an equation system meet 
this condition. If this is the case, all functions occurring in the fixed point iter- 
ation will belong to S'ho. We omit the proof since it is mainly technical. 

The drawback of the compact representation of values as lists of their ir- 
reducible components described at the end of the last section, is that we have 
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to find a compact representation for the join irreducible elements of a type 
t. Fortunately we found an enumeration for the join irreducibles of a type t 
yielding a coding into integers. This coding allows us to implement all oper- 
ations very efficiently. The most expensive operation is the application, which 
can be done by a sequence of divisions and modulo operations. Its costs are in 
0{n ■ Ifloglf + If ■ X^r=i ^i)> where If is the length of the list representing the 
function and the k’s are the lengths of the arguments lists. But this coding is 
quite complicated, so we will not go into details. 



5 Experimental Results 

In this section we present some experimental results obtained by using our pack- 
age. First we show results for an arbitrarily chosen equation system and after 
that the results of an implementation of escape analysis. All systems were tested 
using a naive representation which simply stores all function values for all ar- 
gument tuples. We also tested the additive approach for first-order functions 
(FOadd). Here we coded first-order functions in vectors of a fixed, type depen- 
dent length taking advantage of additivity in the way mentioned above. We also 
implemented the approach for component- wise additive functions (HOadd). For 
first-order we used all possible optimisations for additivity and for higher-order 
we exploited the component-wise additivity. The functions were coded as lists of 
integers with variable length. The ideas of all implementations were sketched in 
the last section. We also present some results for a BDD based approach. 



5.1 Two Simple Examples 

We chose the following system for two reasons. First we want an equation system 
which contains all features of our language. Second we want to demonstrate 
the effects of changing the size of the underlying basic domain to runtime and 
memory usage. We change the size of our domain by modifying the length n of 
the bit vectors used. 

/i : B”,B” ^ B” 

fi{x,y) := X lub y lub proj'^{fi{y,x,x,y)) 

/a : B”,B”,B” ^ B" 
f2{x,y,z) = fi{x,z) lub y lub x 
/a : B”,B” ^ (B",B") 
f 3 {x,y) = tup{fi{x,x),f2{x,y,y)) 

/4 : B”,B”,B”,B” ^ (B”,B”) 

fi{v,w,x,y) = f3{v,w) lub h{x,y) 

/s : (B”,B”,B”) ^ (B”,B”,B”) 

h{x) = f5{tup{proj^{x),prof{x),fi{prof{x),prof{x)))) 
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n 


1 


2 


3 


4 


5 


6 


naive t/ms 


37 


383 


4919 


81145 


n.a. 


n.a. 


naive mem/byte 


266 


366 


2220 


37320 


n.a. 


n.a. 


BDD t/ms 


442 


1806 


4268 


8140 


13952 


22124 


BDD mem/byte 


99328 


140288 


207872 


244736 


419840 


565248 



n 


1 


3 


5 


10 


20 


30 


40 


FOadd t/ms 


18 


45 


75 


167 


430 


811 


1305 


FOadd mem/byte 


264 


274 


302 


428 


923 


1740 


2892 


HOadd t/ms 


27 


59 


93 


191 


385 


599 


845 


HOadd mem/byte 


70 


96 


174 


304 


564 


824 


1084 



Table 1. Runtimes and Maximum Memory Consumption (/ 1 -/ 5 ) 



The runtimes and maximum memory usages for finding the solution depending 
on the length n of the bit vectors involved are in Table 1. As we can see here 
we have an exponential growth in runtime and memory usage dependent on the 
length of the vectors, as we predicted for the naive approach. Hence we cannot 
handle vectors of length 5 and more. In the case of HDD’s we do not have 
an exponential growth but a polynomial one. This is due to the coding of our 
domains into HDD’s, which we cannot present here. Hut even with HDD’s we 
cannot handle large domains. 

Hut if we look at HOadd and FOadd things are a lot better. In the case of 
FOadd runtime and maximum memory usage depend quadratically on the size 
of the bit vectors and thereby logarithmically on the size of the domain. For the 
HOadd approach we get a nearly linear dependency that is caused by the coding 
of functions as lists. 

We now extend the above system by some higher-order equations: 

/g : (B" ^ B”), (B” ^ B"),B" ^ B” 

Uif, 9, x) = frif, x) lub frig, x) lub x 
: (B" ^ B”),B" ^ B" 

f7{f,x) = if x) 

/§ : B” ^ B" 

fs(x) = feiifi (0...0)),a;) 

The results for this system are summarised in Table 2. We can clearly see, that 
the HOadd variant is the only one yielding usable results for non-trivial basic 
domains. We can prove by taking a closer look at the coding function, that 
runtime and maximum memory usage are in 0{n^). 



5.2 A Case Study: Escape Analysis 

To give an example of an abstract interpretation implemented using the op- 
timisations for additive and component-wise additive functions we will take a 
look at an abstract interpretation for escape analysis [Moh95b,Moh95a,Moh97]. 
We built it in a compiler for a simple constructor based higher-oder functional 
language. 
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n 


1 


2 


3 


4 


5 


6 


7 


naive t/ms 


63 


5862 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


naive mem/byte 


412 


33406 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


BDD t/ms 


567 


6197 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


BDD mem/byte 


107520 


347136 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


FOadd t/ms 


45 


1349 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


FOadd mem/byte 


410 


2492 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


HOadd t/ms 


49 


152 


445 


1281 


3574 


9485 


23568 


HOadd mem/byte 


125 


271 


633 


1379 


2725 


4935 


8321 



Table 2. Runtimes and Maximum Memory Consumption (/i-/s) 



This abstract interpretation was designed for a monomorphic, constructor 
based functional languages with higher-order functions to reduce the memory 
usage and runtime by doing compile-time garbage collection. It renders informa- 
tions about what parts of the arguments in an function application can escape, 
i.e. what heap cells reappear in the result, and what parts of the arguments have 
become garbage on the heap. 

We also can use escape analysis to determine the behaviour of higher-order 
functions. If we implement a language that includes A-abstraction and partial 
application, we need to represent functional values as closures on the heap. We 
must do this, because new functions can be created on runtime and be returned 
as a result of a function call. However in many cases these closures do not escape 
from a local context. By performing escape analysis we can approximate this 
property and create the closures on the stack or even statically. 

We will not present this abstract interpretation here. A formal definition can 
be found in [Moh97]. The important property of this escape analysis is that 
it only uses additive basic operations. Therefore, we can use all optimisations 
presented here. Here, we study the results of escape analysis for a quicksort 
implementation. We compare the results of an simple first-order and a more 
sophisticated implementation of quicksort using higher-order functions. Because 
in this analysis the size of the domains depends on the data structures use, we 
will give results for sorting lists of flat integer tuples with different numbers 
of components. We obtained these results by building the escape analysis in a 
compiler for a small higher-order constructor based functional language. The 
runtime results (in ms) for the first-order quicksort are in Table 3. There, n 
denotes the number of components in the integer tuple. 



n 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


naive 


350 


1670 


7635 


60160 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


FOadd 


156 


203 


245 


301 


360 


416 


478 


547 


620 


707 


HOadd 


132 


186 


233 


292 


344 


400 


456 


514 


575 


639 



Table 3. Runtimes for FO-Quicksort in ms 



For the naive approach we have an exponential growth in the size of the 
tuple. Again this is usable for small domains only. The situation is again better 
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for the additive variants. Here we have an almost linear runtime. This runtime 
is in fact quadratic, as we could prove, but with small coefficients. Hence we can 
use both approaches for first-order functions. 

Now we will consider the analysis of quicksort written in functional style. In 
this implementation we used higher-order functions for implementing the filter 
functions, that yield the lists of the elements less or greater than the pivot 
element. We obtained the results in Table 4. 



n 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


naive 


2440 


8460 


27460 


133320 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


FOadd 


1410 


4060 


9520 


25050 


75014 


n.a. 


n.a. 


n.a. 


n.a. 


n.a. 


HOadd 


320 


400 


503 


622 


731 


873 


1009 


1152 


1299 


1503 



Table 4. Runtimes for HO-Quicksort in ms 



As we can see here, for the naive approach we have a growth that is worse 
than exponential like in other higher-order example. For the FOadd variant we 
got an exponential growth leaving the HOadd approach as the only one with a 
bearable complexity. HOadd yields a polynomial (0(n^)) complexity. We have 
better results here, because the size of the tuples has a smaller effect on the 
higher-order types here. 



6 Conclusion and Future Work 

We have shown that the framework of pure additive functions presented in 
[NN92] is not expressive enough for higher-order abstract interpretations. Since 
function application is a non-additive operation needed in virtually all abstract 
interpretations, we cannot use this framework for implementing them. We pre- 
sented a new class of component-wise additive functions solving this problem. 
We have shown some properties of this class allowing us to solve higher-order 
equation systems in polynomial time. Hence we can implement a large class of 
higher-order abstract interpretations very efficiently. 

Furthermore, we presented a language-independent package for abstract in- 
terpretations allowing us to solve higher-order equation systems defined in a 
special language. We have proven the applicability of the optimisations possible 
in the component-wise framework for a class of higher-order equation-systems 
definable in our language. Hence we have shown the importance of our new class 
of functions, because it enabled us to implement a polynomial time fixed point 
solver. We also presented some experimental results including a case study for 
the escape analysis. We compared the runtimes of implementations based on our 
framework and other approaches like HDD’s. The results have strongly justified 
the introduction and use of our new class of functions. 

Future work will include some other case studies for other abstract interpre- 
tations. Especially we are interested in doing an escape analysis for Java using 
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our package for abstract interpretations. More work needs to be done in spec- 
ifying other classes of equation systems that fulfil the condition mentioned in 
Theorem 5. We also want to define some transformations that lead to systems 
compliant with Theorem 6. 

The package is available on request. 
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Abstract. In the context of functional programming, semantic methods 
are commonly used to drive program transformations. However, classi- 
cal semantic domains often rely on recursive objects which embed the 
control flow of recursive functions. As a consequence, transformations 
which have to modify the control flow are difficult to define. We propose 
in this paper a new semantic domain where the control flow is defined 
implicitly, and thus can be modified. This new theoretical and practical 
framework allows to homogeneously define and extend powerful trans- 
formations related to partial evaluation and deforestation. 

Keywords: semantics, program transformation, partial evaluation, de- 
forestation. 



1 Introduction 

Many frameworks use some semantical abstraction to transform functional pro- 
grams. For instance A-calculus [11,8], catamorphisms [5], hylomorphisms [7], 
folds [9] . . . A more complete bibliography and a large comparison of these differ- 
ent frameworks can be found in [3]. 

All of them share a similar global structure. Thus, functional programs are 
abstracted in some mathematical domain. Transformations are performed on 
the abstraction of the program. Then, the transformed program is obtained by 
a backward translation from the mathematical domain to functional programs. 

For instance, the HYLO system [7] transforms a functional program into hy- 
lomorphisms and then perform partial evaluation and sometimes deforestation, 
thanks to many theorems (acid rain theorem, fusion law. . . [6]). Then, these new 
hylomorphisms could be translated back into functional programs. 

However, all frameworks we know share a surprising constraint: the abstrac- 
tion of functional programs always relies on “functional” objects where recursive 
structures or schemes are strongly preserved and can not be easily modified. For 
instance, with A-calculus, the recursive calls are defined in extenso in the struc- 
ture of the A-terms. With hylomorphisms (and folds), these recursion schemes 
are exactly pointed out by functors which are used as transformation parameters. 
Thus, a transformation can not freely restructure these recursive schemes. 

We propose in this paper a new abstraction for functional programs, which 
does not rely on such “functional” objects. In our semantics, the control flow is 
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neither defined nor fixed. A related operational semantics explicitly defines recur- 
sive schemes and control flow. This latter semantics can be computed from the 
former and allows to perform backward translations into functional programs. 

This paper presents an homogeneous framework to define and extend classical 
program transformations related to partial evaluation. Especially, the system is 
quite powerful at transforming control flow and recursive schemes of functional 
programs. 

The paper is structured as follows. Section 2 fixes notations for functional 
programs and their semantics. In section 3, we introduce our mathematical ob- 
jects with an example, and then we precisely define them. They are named 
Equational Programs and their Equational Semantics is defined in section 4. 
Obtained results and powerful transformations are then presented in section 5. 
The end of the paper is about technical bases for all this framework: translation 
from functional programs is given in section 6, operational semantics is presented 
in section 7 and backward translation from equational programs into functional 
ones is presented in section 8. 

Notations: we will assume standard definitions for sets and relations. We will also 
make use of the following notations : 

— when a set S is the singleton {s}, it is also denoted by s without brackets. 

— TZ = [77i; 77.2; . . . ; TZ„] is the relation defined by: 

aTZb 4^ aTZiaiTZ 2 ■ ■ ■ TZnb 

— TZ" = [TZ; . . .;TZ] (with n occurrences of TZ) . 

— TZ* is the transitive closure of TZ. 

— TZi + 77.2 is the relation TZ defined by 

aTZb 44 aTZib or a77.2& 



2 Functional Programs 

In this section, we fix the functional programs we consider. In few words, it is 
a standard functional language with higher-order and pattern-matching. Never- 
theless, we will only consider well-typed programs, regardless of which kind of 
type system is used. Programs are defined according to the BNF definition in 
Figure 1-left-top. 

In this definition, the (jj) symbol means “arity of” . Type-constructors of user- 
defined data types are denoted by c (for instance, cons, nil, etc.). Primitive values 
and operations (integers, etc.) are denoted by tt. Notice that this grammar is 
sufficient to define higher-order functions, since partial applications are possible. 
Expressions like fun x — >■ e which appear in classical functional programs can 
be translated into a new function name. For instance, the program 

let horev x = match x with 

cons a b -> let k = horev b in fun h -> (k (cons ah)) 

I nil -> fun h -> h 
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Fig. 1. FP definition and semantics. 



is just syntactic sugar for the following one: 

let horev = fun let fl a k h = (k (cons ah)) 

cons a b -> ( (fl a) (horev b) ) let f2 h = h 

I nil -> f2 

We consider a standard operational semantics for FP, in the call- by- value 
style. It is defined by a relation where 7 is an environment which associates 
variables to values (the empty environment is denoted by £; the association of x 
to V in 7 is denoted by 7 ( 0 ; ■ v)). Values are classically defined in Figure 1-left- 
bottom. 

We denote by fp the function / partially applied to exactly p arguments, in 
order to be consistent with further definitions. For primitives, we will suppose 
that a rewriting rule c> is available, such that for instance (- 1 - 1 2) > (3). The 
operational semantics for FP is then defined in Figure 1-right. 
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3 Equational Programs 

3.1 An intuitive presentation 

Consider the following program: 

let length = fun nil -> 0 I cons a b -> (+1 (length b) ) 

We can have, as an example, the following execution (here, = gives interme- 
diate steps for — >-e): 

length (cons 5 (cons 6 nil)) = 

(+ 1 (length (cons 6 nil))) = 

(+ 1 (+ 1 (length nil))) = 

(+ 1 (+ 1 0 )) = 2 

Now let us denote each list by a variable x, and the result of the function 
length on x by the variable x. length {length will be called an attribute on x). 
Intuitively, this implies the following equations: 

(Vx) X = {cons a b) ^ x.length = (-1- 1 b. length) (1) 

(Vx) X = {nil) x.length = 0 (2) 

When the variable x is associated to a term like {cons t\ ^ 2 ), we use by 
convention the variables x.l and x.2 to respectively denote the sub-terms t\ and 
^ 2 - In this context, the previous execution could be represented by the following 
list of statements, where x, x.2, and x.2. 2 could be thought as variable names: 

X = {cons 5 {cons 6 nil)) x.2.2. length — 0 

x.length = (-|- 1 x.2. length) x.2. length = (-1- 1 0) 

x.2 = {cons 6 nil) x.length = (-|- 1 (-|- 1 0)) 

x.2. length = (-1- 1 x.2.2. length) x.length = 2 

x.2.2 = {nil) 

Note that the two equations (1) and (2) are similar to the functional pro- 
gram, and that the above list of statements satisfies them. When a function 
uses additional parameters, such a comparison is still possible. For instance, the 
program 

let rev h = fun nil -> h I cons a b -> rev (cons a h) b 

is associated to the following equations where, for any list denoted by the variable 
X, the parameter h is denoted by the variable x.rev-h and the result of the 
function rev is denoted by the variable x.rev. Informally, x.rev stands for the 
result of {rev x.rev-h x). 

(Vx) X = {cons y z) ^ x.rev = z.rev 

(Vx) X = {cons y z) ^ z.rev Ji = {cons y x.rev Ji) 

(Vx) X = {nil) x.rev = x.rev Ji 
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Thus, a set of equations seems to be sufficient to describe the values computed 
by a functional program on list-like structures. For higher-order functions, we 
have to manage partial applications, namely, expressions like (ci 62). In such 
situations, we propose to associate a fresh variable x to the expression ei. Then 
the variable x.arg is associated to 62, and x.call to the (partial) application 
(ei 62). 

For instance, consider the function definition let f a b = (+ a b) . The 
function f could be partially applied, thus we need the following equations related 
to arg and call: 



(V®) X = (/o) => x.call = (/i x.arg) 

(Vx) X = (/i y) x.call = {+ y x.arg) 

Here, the constructors /o and fi are used to represent functional values such 
as (fo) and (fi ui), that is, the different partial applications of the function f. 
The equations defining variable x.call from x.arg are then consistent with the 
definition of (ci 62) — ^ given in Figure 1. 

Thus, it seems possible to represent programs by a set of equations. Of course, 
we have to formalize such a translation, and to prove a semantic equivalence 
between the two representations. This is the aim of the paper, and we will 
start with a definition of what is an equational program. The section 4 gives 
its semantics, and we will present in section 6 a translation from functional 
programs into equational ones. 

3.2 Definitions 

The following definitions are mutually recursive and must be considered all to- 
gether. 

Terms: Terms are built using constructors or primitives with variables or sub- 
terms as parameters. There is no function call. The set of the variables appearing 
in a term t is denoted by Vars(t). 

Values: a value u is a term which contains no variable, i.e. Vars(u) = 0. 

Variables: They name or represent terms. A variable can have several forms: 

— a simple name (an identifier). 

— x.k {k is an integer) represents the fc-th sub-term of (the term represented 
by) the variable x. 

— x.a (a is an attribute name) represents the attribute a attached to the vari- 
able X. 

— x.Lk {k is an integer) represents the fc-th local variable associated to the 
variable x. 

The form x.Lk is just a way to make a new variable name which is associated 
to the only variable x and could be used as a fresh local variable to name 
intermediate or dynamic computations. 
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Attributes: There are two sets of attributes, TZes for attributes which represent 
the results of a computation, and Vrm for those which represent the parameters 
of a computation. Then, with p G Vrm and r G TZes, the variable x.r represents 
the result of computing the attribute r on x when the attribute p is equal to the 
term represented by x.p. 

Statements: A statement is an oriented equation of the form x = t, where the 
left-hand-side is restricted to be a variable. A system A is a set of statements. 

Equations and Program: A program is defined by a set of equations, which are 
restricted to be of the following form: 

(Vx) x = {cyi... Um) ^ z = t 

where the statement z = t may refer to x and yi . . . ym- Thus, we will use the 
shortcut notation c ^ z = t where the variable x is replaced by the special 
identifier a and j/i . . . j/m by a.l . . . a.m. For instance, the equational program 
associated with the function rev is: 

cons — >■ a. rev = a.2.rev 

cons — >■ a.2.revJi = {cons a.l a.revJi) 

nil a. rev = a.rev-h 

These definitions are summarized in figure 2. 



V 


:= (c ^ stmt)* 




stmt 


:= X = t 




X 


:= a 1 x.k 1 x.a 


x.Lk 


t 


:= x 






{cti...t„) 


n = #c 




{■7Vtl...tn) 


n = Utt 


E 


:= stmt* 





Fig. 2. Equational Programs 



4 Equational Semantics 

We define here a semantics for a given equational program V. The intuitive 
idea consists in computing output-statements Sout from input-statements Ejn, 
by adding new statements such that the equations of the program V remain 
satisfied. 





270 



L. Correnson et al. 



4.1 Substitutions 

Two kinds of substitutions are involved here. The first one, denoted by [x := t], 
replaces each whole occurrence of the variable x with the term t. Thus, we have: 
(+ 1 x)[x := t] = (+ 1 t), but (+ 1 x.a)[x := t] = (+ 1 x.a) since x.a is not a 
whole occurrence of x. 

The second kind of substitution, denoted by [x], replaces the special identifier 
a by a: everywhere it appears, even inside variables. Thus, we have (+ 1 a)[x] = 
(+ 1 x), and (+ 1 a.a)[x] = (+ 1 x.a). 

4.2 Derivations 

A step is a relation denoted by — >-p, such that S — >-p S' holds if and only if one 
of the following rules holds^: 

— the sub-term rule, which deals with sub-term variables, holds when x = 

(c € S and S' = S U {x.k = tk}. 

— the substitution rule holds when x = tGS,y = t'GS and S' = S\J {x = 
t[y := t']}. 

— primitive operations are handled by the primitive rule which holds when 
x = tGS,t>t' and S' = S U {x = t'}. 

— Finally, the instantiation rule deals with applying an equation of the program 
V. This rule holds when x = {c ti...t„) G S and when there exists an equation 
of the form c ^ y = tin V, and when S' = S\J {y[x\ = t[x]}. In this special 
case, the fact S -^-p S' is also denoted by S JJ' when the instantiated 
equation should be pointed out. 

Remark that if the instantiation rule is not used, the relation — >-p could be 
replaced by — >-0. Now, we define the relation =J>-p by: 

S^p S' ^ S^^ S" and S' C S" 

Intuitively, S =^p S' means that there exists a derivation from S which 
produces at least the equations S' . For instance, consider the following program 
V: 

cons — >■ a.l = (-1- 1 a. 2 .l) 
nil — >■ a.l = (0) 

Possible derivations lead to: 

{a = {cons 1 nil)} =^p {a .2 = {nil)} 

” ^p {a. 2 .l = (0)} 

” {a.l = (-|- 1 a. 2 .l)} 

^P {a.l = (1)} 

^ To make short-cuts, each “free” variable which appears in these definition is supposed 
to be universally quantified (V® . . . ). 
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The relation rog benefits from many properties. For instance, the fol- 
lowing theorem holds^: 

Theorem 41 =J>p is monotonic, that is: 
if (Vz) then (IJ^ Si) ((J^ S[). 

As a direct consequence, =^>p is confluent. We have also: 

S =J>p S' if and only if (Vx) S[x] =^>p T’'[x]. 

4.3 Semantics 

We are interested in using equational programs to perform program transforma- 
tions. So we need a semantics which only consider what are the values computed 
by a program, not how they are computed. Consider the system Sin = {a = u}, 
where w is a value (ze. a term with no variable). Any derivation from Sjn is 
a trace of an execution of the program V on the value v. The resulting values 
are statements of the form a.r = Vr where r G TZes. An interesting semantics 
associated to V should not consider any complete derivation, but only the values 
V and Vr- 

More precisely, the semantics of an equational program V is defined according 
to a pair (P, R), where P is sequence of parameter attributes, and R a sequence of 
result ones. If P = {pi . . . p„} C Vrm and R = {r\ . . . r^} C TZes, the semantics 
of P according to (P, R) is denoted by |P]p,p and is the relation between tuples 
of values defined by : 



(u,z;i . . .u„)|P]pp(wi . . .w„i) 



a = V 

a.pi = vi 



^•Pn — 



a.ri = wi 






Thus two programs P and P' are equivalent if and only if their semantics 
are equal (using the standard equality on relations) for all pairs (P, P). With 
such a definition, if P and P' are equivalent, they may use completely different 
derivations, but they must compute the same values. 



5 Results 

As presented in Section 3, there is a translation from functional programs to 
equational ones, which is correct with respect to both functional and equational 
semantics. Moreover, the inverse translation exists, and is also correct. The first 
one is defined in section 6, and the second one is defined in sections 7 and 8. 

The translation from an equational program P into a functional program 
FP computes the functions that could be defined according to the equations in 
P. Thus, transformations are no more restricted by any fixed recursion scheme. 
Moreover, it is possible to freely add new equations to P if they are consistent 

Note that S[x] = {y[x\ = t[x\\y = t G S}. 



2 
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with its semantics. These additions may not directly participate to any “func- 
tion” recursion. The translation from V into FP will decide which equations of 
V have to be taken into account to define functions. 

Actually, all of the transformations presented here, and many others, have 
been implemented in a completely systematic transformation system. Our im- 
plementation takes a functional program, translates it in an equational one, 
transforms it, and then produces back a new functional program. 

5.1 Partial evaluation 

Thanks to the theorem 41, partial evaluation is easy to define. Consider the 
following relation: {a = (c a.l...a.n)} ^-p S. 

Then, S[x\ is a set of statements that can be deduced from any system con- 
taining an equation of the form x = {c . . . ). Then it is possible to prove that 
adding the statements {c ^ x = t\x = t & S} to V \fi consistent with its se- 
mantics. Now, computing the operational semantics for V will automatically get 
benefit from these new equations. The final result obtained is a partial evalua- 
tion of V ■ For this method of partial evaluation, the only problem of termination 
comes from functional programs that infinitely loop. 

5.2 Approximative dependences 

The operational semantics of an equational program V points out some de- 
pendences between parameter attributes and result ones. Thus, in the relation 
I’^l the attributes of R depend on those of P. Fortunately, it is easy to com- 
pute an approximation of these dependences, denoted T>ep. We expect that if the 
result attribute r may depend on the parameter attribute p, then p G T>ep{r). 
This approximation is computed by looking for every equation which may par- 
ticipate to the computation of the attribute r, and by collecting every parameter 
attributes involved. This analysis will be very useful for further transformations. 

5.3 Specialization 

Specialization of a function f is, for example, a new function g such that g x = 
(f K x) where K is a constant (a value). Sometimes, introducing such a function 
g allows to perform simplifications. In equational programs, a specialization is 
defined in two parts. Let p be a parameter attribute and K a constant, a new 
attribute r' is defined for every result attribute r such that p G 'Dep(r). As 
the first step, the definition of S S' is extended by the rule where S' = 
S U {x.r = x.r'} if x.p = K G S. As the second step, we add new equations for 
each constructor c, namely: 

c — >■ a.r' = a.Lm-r 

c — >■ a.Lm-p' — a.p (Vp' G T>ep{r) — {p}) 
c — >■ a.Lm-P = K 

The local variable a.Lm is supposed to be fresh. The specialization is then 
automatically performed by partial evaluation. 
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5.4 Deforestation 

Deforestation is an extension of the specialization dealing with function compo- 
sitions. In functional terms, it consists in defining a function h such that h x 
= f (g x) . There are well known methods to simplify function compositions, 
but they are not powerful enough, especially in the presence of additional pa- 
rameters. In most cases, the problems come from the difficulty to change the 
recursion scheme of a function in the context of standard semantics for func- 
tional programs. In the context of equational programs, recursive schemes are 
computed from equations, so the problem is simpler. Actually, the composition 
of two attributes can be defined in a way similar to specialization, by introducing 
new attributes and by extending the relation — >-p. The deforestation works well, 
even through parameters, as illustrated by the following examples. 

The definition of deforestation depends on the kind - result or parameter - of the 
involved attributes. 

Result- deforestation: Suppose that r and s are two result attributes, with T>ep{r) = 
{pi . . .pn\ and Dep{s) = {gi . . . q-m}- When the attribute s is computed on the result 
given by r, we say that there is a composition of r and s. It would be also defined 
by a new result attribute r' and from the new parameter attributes q[ . . .q'^ to avoid 
any name-clash. Thus, we can extend the relation E — E' by a new rule where 
E' = E VJ E+ if a: = y.r € E. E+ is dehned in the sequel. The new attributes involve 
new equations, for each constructor c involved by the computation of r: 



c a.r' = a.Lioc'S ^ J x.s = y.r' 

c — >■ a.Lioc.qj = oi.q'j + 1 y.q'j = x.qj 

c — >• a.Lioc = a.r 

The local variable a.Lioc is supposed to be fresh. 

Parameter- deforestation: To deforest through parameters, the solution is similar. 
Suppose the result attribute r is computed on a parameter attribute p, with T>ep(r) = 
{pi . . .pn\. Then the new result attributes are . . .r^, and the new parameter at- 
tribute is p' . Then, we extend the relation E — >-p E' by a new rule where E' = EU E+ 
if a; = y.p G E. E+ is dehned in the sequel. The new attributes involve new equations, 
for each constructor c where a variable x.p is involved: 

^ _ f x.r = y.p' c x.p' = x.Lioc.r 

^ ^ \ y-r'i = x.pi C x.Lioc.Pi = x.r'i 

c — >• x.Lioc = x.p 

The local variable x.Lioc is supposed to be fresh. 



5.5 Examples 

All these examples come from the implementation of our system. It is available 
on the web^. 

http: //www-rocq. inria.fr/~correnso/agdoc/ 
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let flat X h = match x with 


let f = 


node a b -> flat a (flat b h) 
1 leaf n -> cons n h 


fun t_27 -> (((fpfun_l t_27) nil)) 


let flatten x = flat x nil 


let fpfun_l = 
fun t_42 -> ( 


let f X = reverse (flatten x) 


fun t_43 -> (match t_42 with 
1 node t_44 t_45 -> 
((fpfun_l t_45) 

((fpfun_l t_44) t_43)) 

1 leaf t_51 -> 

((cons t_51) t_43) 

)) 



Fig. 3. flatten and reverse 



Reversed flatten: the function f given in Figure 3-left takes a binary tree, flattens 
its leaves, and then reverses the obtained list. After four steps of deforestation, 
the program in figure 3-right is obtained. One can observe that it is a variant of 
the function flat where the tree is flattened in the reversed direction. So, our 
analysis and deforestation methods are able to completely modify the control 
flow of a recursive function. 



let append x y = match x with 
cons a b -> cons a (append b y) 

1 nil -> y 

let f X y z = (append (append x y) z. 


let fpfun_2 = 
fun t_38 -> ( 

fun t_39 -> (match t_38 with 
1 1 nil -> t_39 

1 cons t_41 t_42 -> 

((cons t_41) 

((fpfun_2 t_42) t_39)) 

)) 




let f = 

fun t_16 -> ( 
fun t_17 -> ( 
fun t_15 -> ( 
((fpfun_2 t_16) 

((append t_17) t_15)) 

))) 



Fig. 4. composition with append 



Ineflflcient composition: Figure 4-left presents the function append which ap- 
pends two lists, and the function f which appends three lists. Actually, the 
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let revho x = match x with 


let fpfun_l = 


cons a b -> 


fun t_ll -> ( 


let k = (revho b) in 


fun t_12 -> (match t_ll with 


(fun h -> k (cons ah)) 


1 nil -> t_12 


1 nil -> (fun h -> h) 


1 cons t_14 t_15 -> 




((fpfun_l t_15) 


let reverse x = ((revho x) nil) 


((cons t_14) t_12)) 




)) 




let reverse = 




fun t_3 -> (((fpfun_l t_3) nil)) 



Fig. 5. reverse with higher order 



expression (append (append x y) z) should be translated into 
(append x (append y z)) to avoid one duplication of each list x and y. Defor- 
estation performs the transformation automatically as shown in Figure 4-right. 

Removing continuations: As a last example, we transform the reverse function 
written with a continuation, given in Figure 5-left. The data deforested is the 
continuation. The result in Figure 5-right is equal to the standard function rev 
with accumulator. This result shows the power of dealing with a system which 
does not include function calls. In equational semantics, functional values are 
encoded like other values, and thus, they could be treated in a same way. Here, 
the elimination of the continuation is performed by the standard deforestation 
for equational programs. 

6 Translation 

In this section, we will see how to translate a functional program into an equa- 
tional program. This translation works by a simple encoding of the operational 
semantics of functional programs given in section 2. 

Constructors and primitives are directly translated from FP to equational 
notations. But since there is no function in equational programs, we have to 
define a new constructor for each partially applied function. Thus, the functional 
value (/p Vi...Vp) is also a value in equational semantics by considering /p as a 
classical constructor. 

The main concept driving the translation consists in the management of 
functions and applications, and relies on the following theorem : 

Theorem 61 If (vi)i<s are values, then: 

{Vi V2) V3 ^ {V 1 ,V 2 ) l'Pjarg,call {V3) 

where \P\ is the semantics of the equational program V translated from FP. 
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• fun[let / xi...x„ = e] = 

77 U {/n — > X'} 

where : 

77 = Uo<;=<n 

■y = (xk ■■ a.k)i<k<n(x„ : a.arg) 
S = |e] a. call, 7 

• fun[let / xi...Xn = fun 

c yi...ym 6c I ...] = 
i7u77'u(|j77c) 
where : 

^ = Uo<fc<„ Closureif, k) 

y = (xk ■ a.k)i<k<n 

m = new -local _nbr{) 

77' = /„ ^ { 

a. call = a.Lm-f -Call 
O^-Lrrt'f-Xk — O^.k l<fc<n 

a.Lm = a.arg 

} 

yo = (Xk '■ a.f-Xk)l<k<n 
(Vc) 7 c = yoiyj ■■ a.j)j<m 
(Vc) IIc = c^ [ej a.f -call, yc 

• Closure{f, k) = fk ^ { 

a. call — {fk+i a.l...a.k a.arg) 

} 



•lx\y,y = {y = y{x)} 

• I / 1 y,7 = {y = (/o)} 

• I (ei 62 ) ] y,7 = 

U< 2 ^^ 

where : 

m = new -local -nbr{) 

Xq — {t/ — a. L^. call) 

Xi = |ei] a.Lm, 7 
X2 = |e2] a.Lm.arg,y 

• l{cei...e„)}y,y = 

{y = (c y.L • • • y.L 

rrin )}u(Ut:0 

where : 

(Vi) mi = new -local -nbr{) 

(Vi) Xi = |ei] y.Lmi,y 

• [ (tt ei ...e„) ] j /,7 = 

{y = (tt y-L mi • • • y.L 

mn )}u(Ut:,) 

where : 

(Vi) mi = new -local -nbr{) 

(Vi) Xi = |ei] y.L^i,y 



Fig. 6. Translation: function and expressions 



For each function definition, the function fun| ] defined in figure 6 -left com- 
putes a piece of the expected equational program V. Pieces of equational pro- 
grams are denoted by 77, and the notation U = c ^ S means that 77 = {c—i 
x = t\ x = t& X}. The expression | e ] x,y, defined in figure 6 -right, computes 
a set of statements such that: 



Theorem 62 



e — u <tv X ^-p {x = u} 
with : X = I e ] a;, 7 



For instance, the definition of the function rev is translated into: 



revo — >■ a. call = (revi a.arg) cons — >■ a.rev-call — a.L2.call 



revi — >■ a. call = a.Li.reV-Call ” 
” — >■ a.Li.rev-h = a.l ” 

” — >■ a.Li = a.arg ” 

nil a.rev-call = a.rev-h ” 



— >■ a.L2.arg = a .2 
— >■ a.L2 = a. L^. call 
— >■ a.Ls.arg = [cons a.L^ a.Ls) 
— >■ a.Ls = (revo) 

— ^ a.L4 = a.l 
— >■ a.Ls = a.rev-h 
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After partial evaluation and renaming of local variables, the following equa- 
tional program is obtained: 



revo — >■ a. call = {revi a.arg) 
revi a. call = a.Li.rev.call 
” — >■ a.Li.rev-h = a.l 

” — >■ a.Li = a.arg 



cons — >■ a.rev-call = a.2. rev -call 
” — >■ a.2.rev-h = [cons a.l a.revJi) 
nil — >■ a.rev-call = a.rev-h 



7 Strategies 

In contrast with Section 4, this section concerns the recursive structure of the 
derivations. The aim is to construct canonical derivations for the relation |P]p^_r. 

We will say that a system S defines a variable x if and only if there exists a 
value V such that x = v G E. By extension, we will say that S defines a set of 
attributes A on a variable x if and only if S defines the variable x and all the 
variables x.a where a G A. 



7.1 An example 

Consider the following program V'. 

cons -G a.r = a.2.r {equ^ ) 

cons — >■ a.2.p = {cons a.l a.p) {eqn 2 ) 
nil -G a.r = a.p {eqn^) 

These three equations have been denoted by eqn^ to simplify notations. Actually, 
for any list I the following statement holds : 

{I, ml) in 

where I' is the list I reversed. Recall that such a statement means that there 
exists a derivation =^-p Eg^t where defines p on a and E^ut defines r 
on a. In this section, we want to find the structure of a possible derivation for 
^in Eouf 

We need here new notations. In Section 4.2, we have denoted by the 
instantiation of the equation y = t on the variable x. In the same way, we will 
denote by a derivation which allow to define x.r from x.p. More precisely, 
if system E defines p on x and E E' , then E' defines r on x. 

So, let us start with a derivation E' . From the definition of (— tp), 

we can observe that its (— 10 ) part is not able to introduce a variable a.r on the 
left side of an equation. Thus at least one equation of V has been instantiated 
on a. This requires that either a = {cons v\ V 2 ) or a = {nil). Then, it is possible 
to inductively construct as follows: 

— Either a = {nil): it is possible to use the derivation ( *^-4’ ), thus applying 
the equation associated to nil in the program V. Then (=^> 0 ) will perform all 
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the sub-term, substitution and primitive derivations to produce a system Yl' 
which defines r on a, without using any other equation of V . More generally, 
using Theorem 41 leads to the expected derivation for x = [niV)\ 






nil 



,eqnr> ,x 

= [ ^ ; 



^0] 



— Or a = {cons Vi V 2 )' The strategy consists here in recursively applying 

2 ^ T' oc 2 

the derivation ’-b-’ . As a first step, we get a system which defines p on 

GQn Q (A 

a.2 thanks to the derivation -I’ , followed by =>0 to perform substitution, 
sub-term and primitive derivations. Then we use the derivation ^’-4.' to get 
a system which defines r on a.2, and we end the process by applying -4’ 
followed by to get a system which defines r on a. More generally, using 
Theorem 41 leads to: 






eqn^.x v,r,x.2 eqn-i.x , 

4 ; 4 ;^0] 



Thus, a possible definition of the expected derivation is the inductive defini- 
tion: 






+ d 



X 

nil 



With this definition, for every system Si which defines p on x, and S 2 which 
defines r on a;, the following statement holds'^: 



Z-i S2 ^ Si ^ 4 “" A2 

As a short cut, we summarize all these properties by the following notations, 
which define an operational semantics for V: 

ipjop ^ I cons,seqi) (p, r, nil, seq 2 ) } 

where: 

seqi = [egnj; p, r, a.2; eqn^ 
seq2 = [egng] 



7.2 Operational semantics 

In this section we refine the definitions above. An elementary derivation step s is 
either an equation y = t of the program V, or a triplet (P, R,y). A sequence seq 
is a concatenation [si; . . . ; s„] of steps. A strategy 5 is a set of tuples (P, R, c, s), 
where P C Vrm and R C TZes are two sets of attributes, c is a constructor, and 
s is a sequence. 

X 

From a strategy S, the derivation -b^ is defined recursively by: 

^ The proof for such a statement is made by induction on the length of the derivation 
and then by case-analysis on the equations of V that have been applied to x. 
The proof largely makes use of Theorems 41. 
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i7 S' ^ x={c...)&S 

and {P' , R, c, seq) € S P' C P 
and S' 

S S' ^ seq = [si; . . . ; s„] 
and ^5 . . . ^5 S' 



S "4*/ 27' 



_ V—t,x 

r S' 



(P,R,v),x 

s -^s 



S' ^s 



P,R,y[x] 

~^S 



S' 



A strategy S is an operational semantics for V if and only if, for every system 
Sp defining P on x, and every system Sp defining R on x, the following holds : 



rn P,R,X „ 

Sp =j>-p Sp Sp -^s Sp 



We denote such a statement by \P\°^ = S, and we have the following property: 



{v,Vi . . .Vn)lPlpp{wi . . .Wm) 



a = V 

a.pi = Vi P,R,a 

-^s 



a.Pn = Vn 



a.Ti = Wi 



— '^m 



Notice the difference with the definition of |P]: 



{v,Vi . . .Vn)lP]p,p{'''Jl ■ ■ -Wm) ^ 



a = V 

a.pi = vi 



a.Pn = Vn 



a.ri = wi 






While \P1 gives no information about the structure of the derivation =^>-p 
involved in this semantics, |P]°^ exhibits a derivation with a fixed scheme of 
recursion. 



7.3 Construction of 

This section presents the algorithm which find such an operational semantics for 
a given equational program V. Actually, we want to translate \P\ into |P]°^. 
The kernel of the algorithm is a fixpoint computation of a well suited strategy, 
denoted by Soo- We introduce two notations: 

A ^n,v S' ^ {3S" D S') S S' 

R{n,S)<^ V(P, i?), Sp^n,pSp^Sp Sp 

The fixpoint computation involves a function Next, and an order-relation (□) 
such that the following theorem holds : 

Theorem 71 



(Vn) R{n,S) ^ %{n + 1,S Next{S)) 
S \zS' ^ Next{S) C Next(S') 
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The relation (c) is defined in the next section and allow to define a lattice- 
structure on the strategies. Then it is easy to prove that the greatest fixpoint of 
Next exists and the following holds: 

Theorem 72 (Vn) "H(n,gfix Next) and |P]°^ = gfix Next. 

The function Next is long and complex to define precisely. The next section 
provides guidelines to understand its complete definition. 

7.4 The function Next 

In spite of a long definition, the construction of the Next function is intuitive, and 
consists in exploring what a strategy could consist of. The extended definition 
is reported in figure 7, and is only the formalization of what we presented in 
the introducing example. This section just provides guidelines (in small font) to 
understand the role of the different components involved in these definitions. 

The Next function computes strategies independently for each constructor through 

P R X 

the function Pool. Actually, we must ensure that a derivation will be available for 
all the constructors involved to compute the attributes in R. This set of constructors is 
denoted by th. The predicate implemented{P,R,S) tests if a strategy is available for 
the pair (P, R) in S for each constructor in tr. 

The function Pool{c,S) computes all available strategies when a = {c vi . . . Vn) and 
when the recursive derivations are taken from S. By a fixpoint algorithm, this function 
computes a set of tuples {D, seq, P, R), where P is a set of variables, seq a sequence 
of steps, P and R two sets of attributes. The invariant property maintained at each 
iteration of the fixpoint computation is the following. For each tuple {D, seq, P, R), seq 
is a derivation such that: 

Let 17 be a system which defines P on a, and where a = (c . . . ). Suppose that applying 
the sequence seq on E leads to the system E' , that is: E E' . Then, this system 
E' defines R on a, and it defines also all the variables in the set D. 

Each iteration of this fixpoint algorithm is computed by the function Infer {c, S){E) 
which adds new steps to the sequences seq inside tuples {D, seq, P, R) G E, maintaining 
the invariant property above. 

The other functions compute auxiliary results. Thus need{s) and prod{s) respec- 
tively computes the variables that are needed to be defined before applying the step s, 
and those which are produced after this step. 

The Next function has a greatest fixpoint, with the following order C on strategies: 

5 C 5' (yp,p',R) 

I implemented{P' , R, S') 

I and implemented {P, R, S) 

An good starting strategy 5o to initialize the fixpoint computation is the following 
one: 



^ ^ PCP' 



5o = {(0, r, c, []) I c € r , r G Ties} 

This strategy ensures that the next function could find at least one strategy for 
each result-attribute. It is not possible to start with an empty strategy, because the 
fixpoint would be empty. 
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The algorithm provided here is well-suited to make proofs, but is completely ineffi- 
cient in practice. A naive implementation of the Next function leads to a terrific expo- 
nential algorithm. Essentially this complexity comes from the permutations allowed by 
the confluence theorem 41, and from the large amount of possible pairs (P,R) to con- 
sider. Our implementation improves this algorithm in order to take into account these 
permutations, and to control and limit the number of pairs (P, R) to be considered. 



•Next{S) = { {P,R,c,seq) \ 

(3P' C P) l.,seq,P',R) € Pool{c,S) 
and Vc' G tr, 

(3P' C P) (_,_,P',P) G Pool{c',S) 

} } 

•Tr = {c I Vr G P 3t c — >■ a.r = t G P} 
•implemented{P, R, 5) Vc G tr 
(3P'cP) (P',P,c,_)g5 

•Pool{c,S) = fix {Infer{c,S))Eo 
with Eo = {(P>c, D>0>0)} 
with Dc = {a; a.l; . . a.n}, n = He 
•Infer(c, S){E) = E U {Add{s, e) \ e £ E 
and defined{s,e) and callable{s,S)} 



•Add{s, {D, seq, P, R)) = ( 

D U prod{s), {seq\ s], 

P U {need{s) O {a.p \ p G Prm}), 
R U (prod(s) n {a.r \ r G TZes}) ) 

• defined{s, {D, _, _, _)) 

needs{s) C P U {a.p \ p G Prm} 

• callable{{x = t),S) true 

• callable{{P, R, x),S) 

implemented{P, R,S) 

• need{x = t) — Vars{t) 

• need{P, R, x) = {*} U {x.p | p G P} 
•prodix = t) = {x} 

•prod{P, R,x) = {x.r | r G P} 



Fig. 7. The function Next 



8 Backward Translation 

This section gives guidelines about the translation from the operational seman- 
tics of an equational program {i.e. a strategy) into a new functional program. 

Though consisting in many steps, the translation is not complex. During 
the first step, the strategy |P]°^ is reduced to a new strategy denoted by S 
such that for each triplet (P, P, c) there is at most one sequence seq such that 
(P,R,c,seq) G S. Selecting which sequence will be optimal is a difficult prob- 
lem, but simple heuristics are sufficient to choose interesting sub-optimal ones. 
Actually, we choose sequences with few constructors (to save space), few non- 
evaluated expressions (like y = x.r with x = {c . . .)) and few compositions (to 
make deforestation)®. 

The second step defines the functions to be created in order to implement 
the strategies. For each pair (P,R), since there is only one sequence seq per 

constructor c in S, the relation -Ag can be implemented by a pattern-matching 

® Actualy, we need an approximation for the complexity of each sequence. We are sure 
that related abstract interpretations and static analysis may be used to improve this 
step. Future works will investigate this possibility. 
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function. This function has one parameter per attribute in P, and returns a 
tuple® with one value per attribute in R. Then, for each sequence seq, a piece 
of code is generated. 

To implement a sequence seq, the idea consists in associating each variable 
in seq to a fresh local variable of f. For instance, the strategy of the section 7.1 
is implemented in the following way: 

let f xl = fun 
I cons yl y2 -> 

let z2 = (cons yl xl) in 
let z3 = (f z2 y2) in 
let zl = z3 in 
zl 

I nil -> 

let zl = xl in 
zl 

The association table for the variables is: 

a.l : yl a.p : xl a.2.p : z2 
a. 2 : y2 a.r : zl a.2.r : z3 

But for each variable which is used only once in a sequence, the local variable 
is not necessary, and its definition could be inlined. Thus, the following function 
is generated: 
let f xl = fun 

I cons yl y2 -> f (cons yl xl) y2 
I nil -> xl 

From this basic scheme, there exists many variations. Thus, a constructor c 
for which a unique pair (P, R) is defined should be interpreted as a function- 
closure construtor. Then, no pattern-matching is needed, and a pure functional 
expression is generated. Special treatment is also performed for constructors 
which correspond to tuples. See the results in section 5 to find examples. 

9 Conclusion 

Equational programs and semantics have been dedicated to perform program 
transformations in the context of functional programming. However, this frame- 
works does not rely on functional definitions, such as functors, morphisms or 
A-calculus. Thus, the control flow of a program is not embedded in any fixed 
recursion scheme. Since the control flow is reconstructed after applying program 
transformations, it can be completely transformed. This provides significant im- 
provements to many program transformations, especially to partial evaluation 
and deforestation. 

Another interest of this approach is that equational semantics is not re- 
stricted to functional programs and could be used to modelize other program- 
ming paradigms. The key idea of such a semantics to separate, as far as possible, 

It is easy to add tuples to functional programs as syntactic sugar. 
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what is computed from how it is computed. Such an idea should be used largely 
to improve existing transformation methods. 

This work comes from various interesting formalisms and programming par- 
adigms. For many years, we have been collecting the best of existing techniques, 
such as attribute grammar deforestation [1], folds and hylomorphisms fusion [4], 
type-directed or calculational deforestation [8,10,9,11]. But these formalisms 
were too much different from each others to be compared and to produce nice 
cross-fertilization. This is why we try now to refund them in a new theoretical 
and implementable framework. Following this driving idea, the notion of equa- 
tional programs raised naturaly and equational semantics was not far away. 
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Abstract. In this paper, we explain how we use abstract interpretation for analy- 
sing temporal specifications in TLA+ . An analysis is obtained by building a predi- 
cate behavior which satisfies the specification. Abstract interpretation allows us to 
transit from a concrete world to an abstract world (generally finite). Using abstract 
interpretation, we build abstract predicate behaviors and, in general, if the abstract 
interpretation is sufficiently powerful and expressive, we can build a finite graph 
of abstract predicates to analyse a temporal specification. TLA/TLA+ is based 
on an untyped framework, namely the ZF set theory and we show how abstract 
interpretation fits the requirements of untyping and makes the analysis of temporal 
specifications easier. 



1 Introduction 

Temporal specifications provide an abstract and powerful framework for modelling 
(reactive) systems with respect to safety, liveness and fairness properties, for exam- 
ple, telecommunications services and the feature interaction problem. However, tech- 
niques and tools are required for analysing temporal specifications as for instance 
model checking-based techniques [19] or theorem proving-based techniques [20], be- 
cause temporal notations are powerful expression of rich concepts (fairness) and are 
then very difficult to validate. We focus our work on TLA/TLA+ as a specification lan- 
guage, but we require the possibility to validate temporal specifications by animation 
with respect to an abstraction. 

The environment supporting the B method, namely Atelier B [1,22], provides us a 
kernel of procedures for manipulating syntactical terms and we have used this kernel, 
called Logic-Solver, for analysing temporal specifications in a given abstraction. As 
a matter of fact, our work was driven by the need to develop tools for TLA/TLA+ 
and to validate temporal modeling of services. We have defined an analyser of temporal 
specification and have implemented it using Logic- Solver [3]. It allows us « to simulate » 
and thus, to validate temporal specifications from the beginning of the refinement process, 
allowing the user to observe predicate behaviors which satisfy the temporal specification. 
A predicate behavior models a set of state behaviors. In this way, predicate behaviors 
are abstractions (inverse refinements) derived from state behaviors. The technique puts 
together predicates in order to model behaviors. Initially, our ideas were very intuitive 
and now we are founding them in a theoretical framework, abstract interpretation. 



A. Cortesi, G. File (Eds.): SAS’99, LNCS 1694, pp. 284-299, 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 
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Abstract interpretation [5,6,7,12,13] makes it possible to represent an infinite set of val- 
ues using a finite set of values (for example sets of integer can be upper-approximated 
by congruence [12,13]). It is a very general framework that formalizes links between 
abstraction and concretization through Galois connections and that provides mathemati- 
cal tools for analysing properties of programs or systems by approximating fixed-points 
which model invariants. Moreover, the analysis of programs or systems is carried out 
through assumptions made in the domain of abstraction. In the abstract domain, it is pos- 
sible to evaluate operations or statements in an abstract way, thus making it possible to 
move our reasoning from an infinite concrete world to a simpler abstract world. Abstract 
interpretation is often used to calculate invariants of a program or program component 
and has been used successfully for an industrial code (The ARIANE 5 example) [14]. 
We have used abstract interpretation: 

- to improve the visibility of certain predicate behaviors and, especially, to gather an 
infinite number of predicates (concrete) in a finite number of abstract predicates 
while allowing the evaluation (abstract), 

- to build abstractions of predicates that have less precise interpretations (abstract), 

- to build a finite graph that schedules abstract predicates of an abstract behavior. This 
graph makes it possible to model the behavior of the abstract specification and also 
enables better code generation for an implementation, easier proofs of invariants. 

Our approach leads to an abstract animation of specifications or symbolic execution of 
specifications, since it extends the scope of animation. The animation process improves 
the communication between the customer and the specifier, since a temporal specification 
is very difficult to understand. Thus the animation helps in the validation process. We can 
merge different kinds of abstractions and keep a part of our domain still uninterpreted. 
We call this a partial abstraction. 

Section 2 gives a short presentation of TLA and TLA+ and its semantics. Section 3 
sketches the process for animating temporal specifications. Section 4 describes abstract 
interpretation adapted to our framework. Section 5 defines an (untyped) abstraction 
for TLA. Section 6 illustrates applications of our works to abstract animation and to 
the computation of invariants. The last section concludes our work and gives future 
directions for this approach. 

2 The temporal logic of actions and its semantics 

TLA+ [17] is a temporal specification language based on set theory with the choice 
axiom and the temporal logic of the actions TLA [16]. The language has a mechanism 
for structuring in the form of modules, either by extension, or by instance. TLA is a 
temporal logic making it possible to specify reactive, parallel and distributed systems. 
The semantics of TLA is based on state behaviors of variables. It can be viewed as a 
logic built in an incremental manner in three stages: 

1 . predicates having as free variables rigid and flexible variables and whose semantics 
is based on states. A state s satisfies the predicate P if and only if s|P] is true. s|P] 
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is the value obtained by substituting in P variables by its values in the state s (for 
instance s|a; = 0] = s|a;]'= 0), 

2. actions which are logical formulas having primed flexible variables as well as free 
variables and whose semantics is based on pairs of states. A pair of states (s,t) 
satisfies the action A if and only if s|^]f is true. s|^]f is the value obtained by 
substituting in A unprimed variables by its values in the state s and primed variables 
by its values in the state t. (for instance s|x' = x+l]f = f|a;] = s|x]+l). As 
predicates are actions we thus have s|P]f = s|P] and s|P']f = tlP], 

3. temporal formulas of actions (addition of the operator □) whose semantics is based 
on state behaviors of variables, (sq, sii • ■ -)P 71 (a behavior (sq, si, •• ■) satisfies 
□T) is true if, and only if, Vn G Nat : (s„,s„+i . . .)|7l. As an action is a temporal 
formula we have (sq, si> • ■ IM] = so|^]si. (ex: (sq, si, . . .)|Da;' = a;+l] = 
Vn G Nat : {sn, ■ ■ .)|a;' = x+l] = Vn G Nat : s„|x' = a;+l]s„+i). 

Moreover at each stage, definitions of |FAG] and |-'F] are very simple: 
alFAGj = alFjAalGj 
with cr a state behavior, a pair of states or a state. 

A TLA specification looks like : I nit A 0[Next]y A L where 

- I nit is the predicate which specifies initial states (sollnitj) 

- n[Next]v means that either two consecutive states are equal on v,v' = v (stutter- 
ing^), or Next is an action (a relation) which binds two consecutive states by using 
the variable (not primed) for the first state and the primed variable for the second 
state (Vn G Nat : s„|A^exf]s„+i) 

- L is a fairness assumption (strong or weak) on actions A A ^ Next. We do not 
validate or analyse weak or strong fairness assumptions. 

We use TLA to prove invariance properties (Spec □/), eventuality properties^ 
(Spec OF or Spec => {P Q)) and refinement properties (Specref 
Specabs). Proof rules can be found in [16] 

TLA+ is an extension of TLA including predicate calculus and ZF set theory and mech- 
anisms for structuring a specification in a module. A module is a text containing a name, 
a list of definitions (constants, variables, operators, functions, predicates, assumptions, 
theorems, proofs). A specification is made up of several modules that are combined using 
clauses EXTENDS and INSTANCE. The clause EXTENDS imports the definitions of 
specified modules by a macro expansion mechanism; the clause INSTANCE provides 
the mechanism of parametrization and a bloc-like structure. 

We have used TLA+ to model the user’s point of view of services of the CS 1 [2,8,9,10,11], 
a list of telecommunication services defined by ITU. CS 1 [23] stands for Capability Set 
1. 



' s|a:| represents the value of x in the state s 

^ here, we do not take into account the stuttering which is especially useful for refinement 
3 OF = and P Q = a(P ^ OQ) 
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3 Process for animating temporal specifications 

The process of animation is based on the contruction of predicates behaviors and the 
animation will use predicates as states for modelling the current system. A predicates 
behavior is simply a sequence of predicates. Let Pred be the language of predicates and 
Pred°° = {(S'o, : Vn G Nat : S'„ G Pred} the set of predicate behaviors. 

For a TLA specification, we construct behaviors of predicates which enable us to repre- 
sent the set of state behaviors of the specification. To construct a specification behavior, 
we start with I nit (considered to be a conjunction of elementary formulas), the property 
of any initial state (so|/nit]). Init A Next is a property of any following state si: 
sollnitj A so|tVexf]si sollnit A Nextlsi. 

Init A Next is displayed as a disjunction of conjunctions (not proved to be false and 
of the form Si \/ . . . \/ SI V . . . SJt )) and the user is required to choose one of those 
conjunctions (one of the S}) defined as (= chozce(S'j)). 

Each choice is simplified to obtain, if possible, a property only using primed variables. If 
the choice is unprimed (substituting the primed variable (x') by the unprimed ones (x); 
the unprimed variables are renamed beforehand. An example, if we have Init = x = 0 
and Next = x' = x+W/x' = x+2 we have two choices for 

Init A Next = a; = 0 A (x' = x+1 V x' = x+2) 

= {x' = 0+1 V x' = 0+2) = (x' = 1 V x' = 2) 



This enables construction of a behavior which can be defined as follows: 

S'o = Init 

Sn+i = unprimed{choice{Sn A Next)) 

When no choice is possible (5'„ A Next = FALSE), the behavior can not be extended 
but the user will be able to quit the analyser, to backtrack in the behavior ... At each 
step, our tool tests, if the invariant / is checked, while trying to prove (with our rules 
written in Logic-Solver) the property S'„ +> I. Obviously, even if the invariant has been 
tested on some behaviors, one has not proved this invariant. On the other hand, if the 
invariant I is false for a state, it will be so, if this state is reachable. Sn and satisfy 
the property A +> Next and the following property holds. 

Let PredBeh(Spec) be the set of predicate behaviors of specification Spec. 

(S'o, . . . , S„ . . .) G PredBeh{Spec) <+« [ Sq = Init A Vn > 0 : S„ A Next ] 

We can interpret TLA specifications. We will use the way to write formulas defined by L. 
Lamport in [15]. Let’s take the simple following module TLA+ of Leslie Lamport [16]: 

I MODULE Increment_with_semaphore 1 

EXTENDS Natural 

VARIABLES XX , yy , pc i, pc2, Init 
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Ink = A pc I = "a" A pc 2 = "a" A sem 
A XX — 0 A yy = 0 
Change{pi,p 2 ,zx,zy) = 

V A Pi = "a" A sem > 0 A p'l 
A sem' = sem — 1 
A UNCHANGED {p 2 , ZX, zy) 

V A Pi = "b" A Pi = "g" A zx' 

A UNCHANGED (p 2 , sem, zy) 

V A Pi = "g" A Pi = "a" 

A sem' = sem+1 
A UNCHANGED (p 2 , ZX, Zy) 

Incx = Change {pc i , pc 2 , xx , yy ) 

Incy = Change{pc 2 ,pci,yy,xx) 

Next = Incx V Incy 
Spec = Ink A □ [ Next ] 

I 

An example of animation: construction of the two successors from I nit 

InitAN ext = 

A pc\ =■ "a" A pc 2 = "a" A sem = lAxx = OAyy = 0 
A V A pci = "a" A sem > 0 A pci = "b" 

A sem' = sem—1 A pc '2 = pc 2 A xx' = xx A yy' = yy 

V Apci = "b" A pci = "g" A ... 

V A pci = "g" A pci = "a" A sem' = sem+I A ... 

V A pc 2 = "a" A sem > 0 A pc '2 = "b" 

A sem' = sem—1 A pc'^ = pci A xx' = xx A yy' = yy 

V Apc 2 = "b" A pc '2 = "g" A ... 

V A pc 2 = "g" A pc '2 = "a" A sem' = sem+1 A ... 

V A "a" = "a" A 1 > 0 A pci = "b" 

A sem' = 1—1 A pc '2 = "a" A xx' = 0 A yy' = 0 

V A "a" = "b" A pci = "g" A ... 

V A "a" = "g" A pci = "a" A sem' = 1+1 A ... 

V A "a" = "a" A 1 > 0 A pc '2 = "b" 

A sem' = 1—1 A pci = "a" A xx' = 0 A yy' = 0 

V A "a" = "b" A pc '2 = "g" A ... 

V A "a" = "g" A pc '2 = "a" A sem' = 1+1 A ... 

The set of predicates behaviors which one can traverse is infinite (see the figure 1) 
because xx and yy can take all the values in Nat in addition, all the predicates which 
one can reach are states behaviors ( A i{vari = vak). Our interpreter allows us to 
traverse theoretically the complete tree of the figure 1 as well as in a more abstract part 
with aal{vx, vy) as initial predicate. 

Abstract animation builds abstract predicates which gather a (generally infinite) set of 
concrete predicates. For instance, if S'„ is an abstraction of ie an interpretation of S'„ 
according to a choice of abstract values, we can extend the behavior by constructing an 



= 1 
= "b" 

= zx+1 
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baO(vx,vy) 



i 



i 




1 



1 




with 

aai{vx,vy) = pc\ = 
hao{vx,vy) = pa = 
abo(vx,vy) = pci = 
gao{vx,vy) = pci = 
ago{vx,vy) = pci = 



"a" A PC 2 = "a" A sem = 

"b" A pc 2 = "a" A sem = 

"a" A pc 2 = "b" A sem = 

"g" A pc 2 = "a" A sem = 

"a" A pc2 = "g" A sem = 



1 


A 


AX 


= vx 


A 


yy 


= vy 


0 


A 


XX 


= vx 


A 


yy = 


= yy 


0 


A 


XX 


= vx 


A 




= vy 


0 


A 


XX 


= vx 


A 


TT = 


= vy 


0 


A 


XX 


= vx 


A 


yy = 


= vy 



Fig. 1. Parts of the execution tree of Increment 



abstract predicate 5'„+i such as S'„ A Next. We use abstract interpretation to 

achieve this goal. 

4 Abstract interpretations 

The critical point of our approach is to find a «good» approximation for a predicate 
Sn which gives a «good» evaluation of 5„+i. We define a partial abstraction which 
approximates only a part of the variables and selective abstraction which approximates 
certain variables under a different abstraction. We can also compose the two types of 
abstraction. 

The semantics of predicates is based on the states (of variables). Recall that s|P] which 
is the value of P in the state s. We define the validity domain of a predicate P in the 
following way: 

V alid{P) = {s G State : s|P]} 

A state can be represented using a function inVar — >■ Nat. A set of states is a subset 
of functions (P{Var — >■ Nat)). Nat is an example of set of possible values. We recall 
that a Galois connection is a structure as: 

(Li, El) «7(L2, E 2 ). 

a 

where the following condition is required: 



Vxi G Li,X2 e L 2 : xi El 7(3:2) a{xi) E2 X2 
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4.1 Total abstraction 

Consider the following Galois connection between the sets of naturals and an abstract 
domain L: 

{V{Nat), C) «7(L, C). 

a 

By composing the two following Galois connections, 

iViVar — >■ Nat), C) < , arjriVar — >■ ViNat), C) 

a 

{V ar — >■ 'P(Nat), C) < ^ acjciVar — >■ L, C) 

we obtain a Galois connection"* to approximate the validity domain of a predicate [4] 

{V{Var — >■ Nat), C) < ^ ac o ° Jc{Var — >■ L, C) 

where 

Ur{R) = Xx £ Var.{p(x) \ p £ R\ 

lr{f) = {p £ Var -£ Nat : (Vx G V ar : p{x) G f(x))} 

fCf = Vx £ Var : f(x) C f'(x) 

ac(-R) = a o R 

7 c(f) = 7 °f 

fnf = Vx £ Var : f(x) C f'{x)} 

4.2 Partial abstraction 

To approximate variables of Cari (where Fori CVar and Car 2 = Var—Vari) with 
a we can compose the two following Galois connections: 

{V{Var -£ Nat), C) ctr/Vari7r/Varii{Vari -£ P{Nat))xP{Var2 -£ Nat), C2 
{V ari -£ P{Nat)xP{Var2 — >■ Nat), C2) 

, 7 

^ ^ > ^cjVari'^cjVari 

{{V ari -£ L)xP{Var2 — >■ Nat), C2) 

where 



ar/vari{R) = (Ax G Varl.{p(x) : p £ R})xR/Var2 
7r/Van((/l,/2)) = {p:Vx £ Var : (p/Van(^) ^ fl(^)) X PjVar.^ G / 2 } 
(/i, /2)C2(/{, /a) = Va; G Vari : /i(x) C f[{x) A /a C /' 

Cfc{{Ri, R2)) = {ao R\,R2) 

7 c ((/ i ,/ 2 )) = { 7 ° flj 2 ) 

(/i, / 2 )E 2 (/{, /2) = Vx G Varl : fi{x) C f[{x) A /a C /' 



"* a composition of two Galois connections is a Galois connection 
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4.3 Selective abstraction 



With the two following Galois connections between the sets of naturals and abstract 
domains Li and L2 

(P{Nat), C) q;i7i(Li, Ci) (P{Nat), C) a272(-^2, E2) 

a a 

We can approximate variables (of Gari) with the abstract function «i and the other part 
(yar2 = Var—Vari) with the abstract function «2 we can compose «r 7 r with the 
following Galois connection: 

{Var -)> V{Nat), C) aci27ci2((V^ari Li)y.{Var2 -)> L2), E12) 

a 

where 



Q:c12(-^) (,^/Varj^ ^^li^jVar2 *^^ 2 ) 

/ir r\\ r Tr at , ^ G V avi : (o /Y„rAx) G fi(x)) 

-,Mh. k)) = {p e Vur } 

(/i,/2)Ei2(/^/2) = Vx G Van ■■ fi{x) c /((x) A Vx G Fora : /2(x) C /'(x) 



5 Defining an abstraction for TLA 

We present here a simplified version of TLA to understand more easily our approach. 



5.1 Expressions 

Var is a. set of variables. An expression exp is a term in EXP{Var, Nat String) where 
Nat String is Nat U String. We can model expressions using the following grammar: 

exp -G- exp + exp \ exp ■ exp \ {exp) \ nb \ var \ string 

The semantics of an expression exp is defined on state variables s. Let e be an expression, 
s|e] is defined by induction in the following way: 
s|n 6 ] = nb 

s|expl op exp 2 ] = s|expl] op s|expl] 
s|war] =state of variable var in s 

This semantics defines calculation rules (CR) on expressions; it provides evaluation and 
simplification of expressions as: 

nbi op nb2 — >■ {nb\ op nb\) 



5.2 Predicates 

A predicate Pred is a term in PRED{Var, NatString) where NatString is 
NatVJ String. We model predicates using the following grammar: 




292 



D. Cansell and D. Mery 



Pred — >■ I Pred A Pred \ Pred V Pred \ Pred 
I exp = exp I exp > exp 

I ••• 

I TRUE I FALSE 



The semantics of a predicate Pred is defined on states s. Let Pred be an expression, 
s|Pred] is defined by induction in the following way: 

s\Pred\ A Pred2\ = s|Pre<il] A s|Pred2] 
s\Predl V Pred2\ = s|Predl] V s|Pred2] 
sl-iPredl] = ^s\Predl\ 
slexpl opr el exp2j = s|ea;pl] opr el s|exp2] 

s|raC/P] = TRUE 
s|P4LS£l = FALSE 

This semantics define calculation rules (CTZ) on predicates for evaluating them and 
simplifying them: 

TRUE R FALSE FALSE 
(4 > 2+1) ^ TRUE 



5.3 TLA an untyped language 

TLA/TLA+ is an untyped language[18]: a value of a variable is a set. Variables can 
contain values of multiple types, as shown in the following example: 

I MODULE Increment _Machine 1 



1 

EXTENDS Natural CONSTANT maxint 

1 


1 

VARIABLES XX 

1 


1 1 

ASSUME maxint £ Nat 

1 1 


1 

Init = {xx — 0) 


1 


Next = V {xx = "notcodable") A xr ' 


= xx 


V { XX < maxint ) A xx' = 


XX + 1 


y { XX = maxint ) A xx' = 


"notcodable" 


Spec = Init A □ [Next] „ 

1 


1 



Our abstract interpretation has to handle the untyped nature of TLA/TLA+. We restrict 
the scope of our work to the set of naturals and the set of character strings, as values for 
variables. 



State £ V{Var — >■ NatString) 
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By composing the two following Galois connections, we approximate the validity do- 
main of a predicate 

(ViVar — >■ NatString), C) < , ctrUalrtiaiY ar — >■ V(NatString), C) 

a 

{Var V {Nat String), C) oic,tialctia{Var L U V{String), C) 

a 

When one analyses a behavior, one can split state variables over L and state variables 
over P (String), x £ LU P(String) is splitted into x G LVx G P(String), since if 

Sn A S„_^i ^ Next A = S V S 
SnSS'n^i SnAS'„_^_i Next 

Within our approach, we construct the following graph (see figure 2 which modelises 
all possible executions of the module Increment .Machine: 

{' Maxint'^ 

( Notcodabley, ) 

Fig. 2. Graph for abstract execution of Increment_Machine 



with 

InfMaxint = xx < maxint 
Maxint = xx = maxint 
Notcodable = xx = "notcodable" 



6 Experiments of the abstract animation 

6.1 Implementation of an abstract animator 

The realization of the abstraction function is obtained using Logic-Solver(the language 
of theory of Atelier B[22] (and our animator [3]). Logic-Solver handles terms and can 
rewrite them. We must detect in a predicate the variables which we can abstract. The 
abstract predicate is obtained by injecting abstract values in concrete predicate (lifted 
and transformed). We implemented several abstract interpretations the positive ones, the 
signs (the strictly positive ones, the strictly negative ones, zero, . . .), the intervals in 
the integer, and congruences (even, odd, . . .). In the following example, we choose the 
function positive. 

positive : Nat — >■ {Supeo} 
n I— >■ Supeo 

and if one has the following CP{{Supeo}) calculations rules [6]: 
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Supeo+Supeo — >■ Supeg 
Supeo’Supeo — >■ Supe^ 
Supeo > 0 — >■ TRUE 
Supeo < 0 — >■ FALSE 
Supeo G Nat — >■ TRUE 



We can define our abstract function /? which approximates variables in V ari a subset 
of Var : P = Svan, defining b{a;,..,} our realization ({a;, . . .} is a set of variable which 
we want abstract, n is a natural) 



• PRED{Var,NatString) 



(x 


= n) A P(x) 




(x 


(x 


> n) A P(x) 




(x 


(x 


> n) A P(x) 




(x 


(x 


£ Nat) A P(x) 




(x 


— I d,pnED{Var, NatString)- 


P a 



PRED{Var, NatStringLl{Supeo}) 
Supeo) A 6{,,,}{P (Supeo)) 
Supeo) A 6{,,,}(P(Supeo)) 
Supeo) A S{,,,}(P (Supeo)) 



We have < 

transformation (lifting) of the style (x = 3)A(a; > 2) is lifted in (x = 3)A(3 > 2) 
which is evaluated by (x = 5)ATRUE, then (x = 3). 



Our calculation rules (in the abstract domain) are the union of rules on (concrete) pred- 
icate, abstract rules CTZ({Supeo}) and glueing calculation rules between abstract rules 
and concrete rules as nb+Supeo — >■ Supeo 

CTZ((PRED(Var , NatString U {Supeo})) — 

U CTZ(PRED( Var, NatString) ) 

U CR({Supeo}) 

U CTZ(of glueing between Nat and {Supeo}) 



We must be careful, if one only approximates x\ 

(x = 1) is approximated into (x = Supeo) 

(x = l)A(y = x) is approximated into (x = Supeo)A(y = SupeoY 
(x = l)A(y = 1) is approximated into (x = Supeo)A(y = 1) 

Let us define S'o = P(Sq); we have by definition S'o P~^(So) and we build S'! with 
a similar manner from the calculation of Si by evaluating So AN ext. This evaluation 
injects into Next the abstract values from S'o . Since Next contains only concrete values, 
it is necessary that the approximation of Sq is fine enough to evaluate successors which 
will be correctly interpreted. 

On the other hand when the evaluation of an abstract successor does not lead to an abstract 
state that can give us information to change the abstract function. We can backtrack and 
change the abstract function. We define the following set of abstract predicate behaviors: 
let trace(Spec) be the set of abstract predicates of the specification Spec. 

(So, . . . ,Sn ■ ■ ■) G trace(Spec) (Sq = P(So)) A Vn > 0 : A S'„^i Next 



^ The equality over predicates should be controlled and we must not have rule as '"ix.(x = x)", 
since it allows one to evaluate «Supeo = Supeo» as TRUE, which is wrong. 
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Our interpretation will be considered as «good» or «acceptable», if we can interpret 
the choice (if it exists) will be driven according to this strategy. We accept 
only abstract predicates S'„ such that we can validate Valid{Sn) (see the diagram). 

{Pred{Var,NatString), {Pred{Var,NatStringU{Supeo}), 

Valid Valid^^ V alid I V alid 

‘Jctla'^'yrtla 

{V{Var — >■ NatString) C) < {V ar — > {Supeo}\JV [String), C) 

Let (^o , . . . ,Sn- ■ •)(€ PredBeh[Spec)) be a behavior of the specification Spec there 
is a behavior of abstract predicate (possibly noninterpretable) starting from So such as 
Vn > 0 : ^ 

Proof by induction 

1) n = 0 obvious 

2) n n+1 

we take such as = Sn/\Next 

then we have Sn/\S'^_^_i Sn/\Next P~^[Sn)/\Next = P~^[Sn/\Next) = 
= f3~^[Sn)/\P~^[S'„_^_i) and so proof is correct if Valid[Sn) ^ 0 

(S'„ = FALSE) 

QED 

We have j3~^{Af\B) = fd~^[A)f\l3~^[B) and j3~^[Next) = Next because Next is 
in concrete world. 

6.2 Choice of the good abstraction 

When we select a bad abstraction the abstract interpreter leads us to predicates which 
can not be interpreted. Our interpreter can thus be used to contribute to the choice of a 
good abstraction. For example, at the begining of increment animation, we can interpret 
in an abstract way the initial predicate by interpreting the three variables sem, xx and 
yy because z = 0Vz = l=>a;>0. We can thus interpret one or all of these 
variables by abstract predicates sem = Supeo, xx = Supeo and yy = Supeo- The 
first predicate does not give a good behavior because we cannot evaluate the value of 
Supeo > 0- The abstract Init is: 

Init = So = xx = SupeoAyy = SupeoAsem = SupeoApci = ”a” Apc 2 = ”a” 

Init A Next = 

V A Supeo > 0 A pc'i — "b" A xx' = Supeo A yyl = Supeo 
A sem' = Supeo— f A pc '2 = "a" 

V A Supeo > 0 A pc '2 = "b" A xx' = Supeo A y}/ = Supeo 
A sem' = Supeo— f A pc'^ = "a" 

We can not interpret Supeo > 0 and sem' = Supeo— f. We may choose an abstraction 
with Supeo and Supo (> 0) only for sem. The abstract S\ are: 

Supo > 0 A pc'i = "b" A xx' = Supeo A yy' = Supeo A sem' = Supo—f A pc'^^ = 
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TRUE A pc'^ = "b" A xx' = Supeo A yy' = Supeo A sent' = Supeo A pc'^^ = "a" 

pc'^ = "b" A xx' = Supeo A yy' = Supeo A sem' = Supeo A pc'^^ = "a" 

these two possibilities lead to the incrementation of xx or yy on the other hand our 

interpreter also proposes the two following possibilities : 

Supeo > OApc' = ”b” Axx' = SupeoAyy' = SupeoAsem' = Supeo— lApc'^^ = ”b” 

On the other hand, if we leave the concrete value 1 for sem, we can always evaluate 
each reachable state correctly. 

Moreover, they make possible the construction of a finite graph (figure 3) which makes it 
possible to define the abstract set of predicates behaviors. This graph allows us «to prove» 
the invariant xx > 0 A yy > 0 and to discover other invariants like sem G {0, 1} 
which is not as simple® as that to prove without building the strongest invariant of the 
specification. 




with 

aai = pci 
bao — pci 
abo = pci 
gao = pci 
ago = pci 



a" A pc2 = 
"b"Apc2 = 
"a" A pc2 = 
"g" A pc2 = 
"a" A pc2 = 



a A sem = 1 
"a" A sem = 0 
"b" A sem = 0 
"a" A sem — 0 
"g" A sem = 0 



A XX = Supeo 
A XX = Supeo 
Axx = Supeo 
A XX = Supeo 
A xx = Supeo 



A yy = Supeo 
A yy = Supeo 
A yy = Supeo 
Ayy = Supeo 
A yy = Supeo 



Fig. 3. Graph for abstract execution of Increment 



6.3 Computing invariants 

The set of reachable predicates defines a set of reachable states and them an invariant of 
a specification. Let Spec = I nit A U[Next]x be a TLA specification then 

( V 

{So^...,Sn---)^PredBeh{Spec) i 

® to prove this invariant, we must prove sem = 1 A sem! = sem+1 A pd = ”g"Apc'i = 
"a” sem' G {0, 1} which are not provable 
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is an invariant of the specification Spec because 

Spec^a{ \J \/ Si) 

{So,...,Sn...)^PredBeh{Spec) i 

Proof: 

\) Init ^ V V '5'i because S'o = /nzf 

{So,...,Sn---)^PTedBeh{Spec) i 

2) V MSiANext^ V yS' 

{So-,---,Sn---)^PredBeh(Spec) i {SQ,...,Sn---)^PTedBeh{Spec) i 

then Si AN ext V V 'b'i 

{So,...,Sn---)^Pf‘€dBeh{Spec) i 

we take S''.,.! such as SiAS[j^i = SiANext and then S''.,.! is in PredBeh{Spec) then 
S',AS''+i ^ Next ^ S''+i ^ V y S[ 

{So,...,Sn---)^P'>^^dBeh{Spec) i 

we can easely prove 

Spec^a{ \J \f (3~^{S,)) 

{So,...,Srf-)€PredBeh{Spec) * 



QED 

For the increment’s example (with pci, pc 2 et sem) we obtain the invariant: 







aai 


V bao V abo V 


gao V ago 












with 




















aai 


_A 


A pci = 


"a" A pc 2 = 


"a" A sem = 


1 A XX 


> 


o 

> 


> 


0 


bao 




A pci = 


"b"ApC2 = 


a A sem = 


0 A XX 


> 


Q A yy 


> 


0 


abo 




Apci = 


"a"Apc2 = 


nun A 

D A sem = 


0 A XX 


> 


0 A yy 


> 


0 


gao 


_A 


Apci = 


"Q"ApC2 = 


a A sem = 


0 A XX 


> 


O 

> 


> 


0 


ago 


A 


Apci = 


"a" A pc 2 = 


n^n A 

g A sem = 


0 A XX 


> 


Q A yy 


> 


0 



7 Conclusion and future works 

We have used the framework of abstract interpretation for stating our ideas developed 
initially with Logic Solver. We can improve now the process of animation and help in 
finding invariants for instance. Our experiment with Logic-Solver provides us a tool 
for validating formal specifications developed by somebodyelse; since even if TLA is a 
powerful way to model systems, it hides an inherent complexity of systems. Our tool tries 
to cope with this complexity; the complexity lies in the relationship between the specifier 
and the customer. It has produced a list of questions that are submitted to the customer for 
a validation by the customer and it helps in constructing a correct specification, namely 
a specification where theorems are provable. Dominique Mery was playing the role of 
the customer and submitted TLA+ specifications of POTS and several services and he 
was also playing the role of the specifier. Dominique Cansell has translated and analysed 
specifications and has provided a list of questions to Dominique Mery who has made 
some corrections. Following the philosophy of Leslie Lamport for modelling systems, 
we have preserved the untyped character of TLA+ mainly because it is based on the 
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set theory and we have preserved the freedom of interpreting temporal formulas. Logic- 
Solver is powerful and one can very easily add new rules that may have been forgotten or 
when needed to refine the analysis. We have shown how abstract interpretations could be 
helpful when analysing temporal specifications. It is clear that we have still to develop 
the interface of our program, since it is mainly for communicating informations to the 
customer. On the theoretical side, we have not completely forgotten the question of 
the analysis of the fairness constraints. We have to define what it means « to animate 
a specification stating fairness constraints)). We have to process other big case studies 
and develop other abstractions. Finally, the implementation was carried out with Logic- 
Solver and it provides us a real laboratory for experimenting our abstractions; we now 
need to produce a better interface for offering the tool on the WEB for testing it in the 
large. 
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Abstract. As imperative programs get larger and larger, many inter- 
procedural analyses prodnce progressively poorer results, even as the 
(usually) polynomial cost hurts more and more. The problem is not just 
program size, but that larger programs contain analysis defeating idioms 
(e.g., manual persistence methods and dispatch vectors) and deeper lev- 
els of procedural abstraction. In this talk, I will discuss what my group 
has learned about analyzing mega-programs (applications exceeding one 
million lines of non-blank, non-comment source lines) . I will report on our 
approaches that have and haven’t worked, our attempts to get meaning- 
ful information from analyses with near-linear complexities, engineering 
issues that must be resolved when dealing with mega-programs, and our 
best guesses regarding the eventual feasibility of routinely performing 
static analysis on mega-programs. 



We are concerned with interprocedural analysis of imperative programs. For 
each program point p, an interprocedural analysis computes a set of invariants 
that over-approximate the invariants reaching p along feasible interprocedural 
paths. The amount of overapproximation depends upon the amount of abstrac- 
tion used in the analysis, and the over-approximation of feasible paths. The 
standard tradeoff is time and space requirements versus over-approximation: an 
analysis method saves time and space by considering more paths as feasible and 
by using coarser abstractions of the precise program semantics. 

The talk will primarily concern our experiences analyzing mega-programs, 
which is what we call applications that exceed one million lines of non-blank, 
non-comment source lines. The list below summarizes our experiences, where 
the significance of an item and its location in the list are not correlated. 

— Success requires thinking outside the box of standard quadratic/cubic time 
or space complexity algorithms. We use two methods for getting outside the 
box: inventing algorithms with near-linear time and space complexities, and 
inventing algorithms with near-linear average time and space complexity, 
regardless of complexity (or even decidability). Indeed, near-linear average 
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time complexity of an analysis is more important than any other complexity 

metric. 

may 

All algorithms, even standard, mundane algorithms, need to be carefully 
considered and implemented. For example, when dealing with lists contain- 
ing millions of members, the weaknesses of quicksort, which can usually be 
ignored, standout when all other algorithms touching those lists are near- 
linear. As another example, we had a small bug in our implementation of 
Tarjan’s fast union-find algorithm that didn’t affect correctness, but caused 
one algorithm to require 40 minutes of processing time rather than 2 minutes. 
Unless an analysis and its implementation have been proved correct, they 
are buggy. Testing correctness by running on lOOK line programs doesn’t 
exercise all the situations that arise in mega-applications. (To add insult to 
injury, it’s very hard to debug an analysis when the bug only appears at the 
million line mark.) We’ve yet to create an analysis that, after passing with 
flying colors on over lOOK line programs (the SPEC ’95 benchmarks), didn’t 
have at least one remaining bug exposed by running the analysis over Word 
’97. So one has to implement extremely carefully. 

As a corollary, most results regarding points-to analysis of large programs 
should be treated with suspicion (even our results). The held needs the rel- 
evant researchers to agree to a standard form for publishing the points-to 
solutions their analyses compute, and a standard method for comparing re- 
sults. This standardization will provide two benefits: researchers will develop 
better metrics for summarizing the results of our analyses (the standard of 
average size of points-to sets is far too imprecise), and researchers will have 
a chance to notice and clean up conceptual and implementation errors. Shar- 
ing of results is required for any large scale analysis that is not validated by 
a stringent enough client (such as the optimization phase of a compiler). 
Mega-applications defeat points-to analysis. We have yet to And a practical 
points-to analysis that computes usable results on a mega-application. We 
haven’t run out of ideas, but don’t know how optimistic to be. Another reason 
to be discouraged is the library problem: one doesn’t always have the sources 
for libraries and APIs that large programs seem to call extremely frequently. 
Soundness may require very conservative, results-killing assumptions. 

Large programs have more quirks in them than small programs do, and 
these quirks all work against good analysis results. For example, dynamic 
dispatch, application specific memory management, persistent object han- 
dling (wherein an object is treated abstractly and as an array of bytes), 
multiple levels of procedural abstraction, and gazillions of global variables 
are much worse factors than merely the increasing number of procedures. 
Computers have gotten pretty fast, but one has to use memory very carefully. 
A 450MZ Pentium III with 256MB can perform a monomorphic near-linear 
points-to analysis of all of Word ’97 during a long (20 minute) coffee break, 
where most of the time is spent in the parser. We achieved this speed by 
ensuring that the virtual memory required by the analysis did not exceed 
real memory so that our theoretically near-linear algorithms remained near- 
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linear in practice. We designed our analyses to work on a compilation unit 
at a time, and then composed the results via further analysis. This approach 
also allows for extremely efficient coarse-grained garbage collection. With- 
out such modular analysis and garbage collection, our analyses would require 
much more memory and would lose their near-linear behavior. Further, our 
algorithms are designed to allow datastructures to share as much structure as 
possible to prevent complexity killing bloat (the typed intermediate-language 
researchers have also discovered this important engineering design require- 
ment). 

— We designed a polymorphic type inference engine based on semi-unification 
constraints that scaled to mega-applications. Semi-unification constraints en- 
able a fully modular analysis in which compilation units can be processed 
independently in any order, allowing us to scale the analysis in the same way 
we scaled the monomorphic analysis. Of course, it’s not as rapid as monomor- 
phic analysis, requiring roughly 50 minutes total to perform a points-to anal- 
ysis of Word ’97. 

— Griswold’s observation that it is quicker to reparse a function to get its AST 
than to reread a saved AST from disk does not hold for mega-applications. 
His observation predates the invention and evolution of C-|— I- and compila- 
tion units that routinely read in a million lines of header files. 

— The analysis community needs a uniform metric for program size. We ad- 
vocate using the number of nodes in a program’s AST as a metric because 
it is independent of blank lines, comments, and multi-line macros. GCC 
2.52 from the SPEG ’95 benchmark suite comprises 604,000 AST nodes, 
whereas Word ’97 comprises 6,077,000 AST nodes. (Interestingly, GGG is 
about 140,000 lines, not counting blank lines and comments, and Word ’97 
is 1.4 million lines, measured the same way). 

— If the client of the analysis does not require soundness or completeness, toss 
them out the window as fast as you can. For example, when the client is 
a bug- finding system that is allowed to fail to report bugs, and is allowed 
to report bugs (infrequently) that don’t exist, unsound approximations can 
speed things up greatly. For example, one very successful commercial tool 
(Prefix from Intrinsa) does a great job of finding interprocedural bugs in 
G and G-l— I- programs while making the unsafe assumption that aliasing 
usually doesn’t occur. 

— For products intended for the mass market, product groups will not imple- 
ment anything that is perceived to be complicated or tentative. Algorithms 
that are simple, that compute the least amount of information, and whose 
easily measurable benefits virtually always outweigh any increases in compile 
time, are likely to have the most impact. For example. Gay and Steensgaard’s 
(near- linear complexity) work on stack allocating objects in Java programs 
is more likely to have an impact sooner than the much more thorough, but 
complex, stack allocation methods that have appeared in the literature. 
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Abstract. Safety and secrecy are formulated for a deterministic pro- 
gramming language. A safety property is defined as a set of program 
traces and secrecy is defined as a binary relation on traces, character- 
izing a form of Noninterference. Safety properties may have sound and 
complete execution monitors whereas secrecy has no such monitor. 



1 Introduction 

It is often argued that information flow is not safety. One argument is refine- 
ment based and originates with Gray and McLean [5]. They observed that for 
nondeterministic systems, a class of information flow properties, namely the 
Possibilistic Noninterference properties, are not safety properties. The reason is 
because they are not preserved under replacement of nondeterminism in a sys- 
tem with determinism. An example is an implementation of nondeterministic 
scheduling using a round-robin time-sliced scheduler [8] . A possibilistic property 
basically asserts that certain system inputs do not interfere with the possibility 
of certain events. So nondeterminism is essential to such properties. A safety 
property, on the other hand, is insensitive to this kind of refinement. Another 
argument commonly heard is that information flow is a predicate of trace sets 
whereas safety is a predicate of individual traces. This argument can be applied 
to deterministic systems. We examine it more carefully and present a secrecy 
criterion for programs that relates secrecy and safety. 

2 A characterization of safety properties 

Consider a deterministic programming language with variables: 

(exp) e ::= X \ n \ ei + 62 \ 61 — 62 | 61 = 62 

(cmd) c ::= x := e \ ci; 62 | if e then ci else 62 | while 6 do c 

* This material is based upon activities supported by the National Science Foundation 
under Agreement No. CCR-9612345 [sic]. 
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Here x stands for a variable and n for an integer literal. Integers are the only 
values; we use 0 for false and nonzero for true. Note that expressions do not have 
side effects, nor do they contain partial operations like division. 

A transition semantics is given for the language in Fig. 1. We assume that 
expressions are evaluated atomically. Thus we simply extend a memory /i in the 
obvious way to map expressions to integers, writing /i(e) to denote the value of 
expression e in memory fx. 



(update) 

(sequence) 

(branch) 


X G dom{fx) 


{x := e,fx) — > g[x := /i(e)] 

(ci,m) — > h' 

(ci;C2,m) — > ic 2 ,fx') 

(ci,m) — > ici,y) 

(ci;C2,m) — > (c'i;C2,/i') 
fx(e) A 0 




(if e then ci else C2,fx) — > (ci,/i) 




fx{e) = 0 




(if e then ci else C2,y) — > (c2,fx) 


(loop) 


fx{e) = 0 



(while e do c, p) — > jx 
M(e) / 0 



(while e do c, /i) — > (c; while e do c, p) 



Fig. 1 . Transition semantics 



The rules define a transition relation — > on configurations. A configuration 
m is either a pair {c,fx), where c is a command and /i is a memory, or simply 
a memory fx. We define the reflexive transitive closure — in the usual way. 
First m — m, for any configuration m, and m — m” , for A: > 0, if there is 
a configuration m' such that m — m' and m' — > m” . Then m — m' if 
m — m' for some A: > 0. 

A trace is a (possibly infinite) derivation sequence mi — > m 2 — > • • • with 
finite prefixes mi — > m 2 , mi — > m 2 — > m 3 , and so on. And if cr is a trace 
then so is every prefix of a. 

Definition 1. A safety property is a set S of traces such that for all traces a, 
a is in S iff every finite prefix of a is in S. A program is safe if every trace of it 
belongs to S. 

The “only-if” direction guarantees S is prefix closed, and the “if” direction allows 
us to reject an infinite trace by examining only a finite amount of it. If there is 
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an infinite trace that is not in S, then it must have a finite prefix that is also 
not in S. Hence safety cannot rule out behaviors that amount to reaching some 
execution state infinitely often. 

We also assume that the set of all finite traces in S is recursive. Although 
this need not be true of a safety property, it seems reasonable given that one 
typically identifies a safety property with the ability to enforce it at runtime by 
examining program traces of finite length. 

3 A characterization of secrecy 

We want to talk about secrecy in programs of our deterministic language so how 
should secrets be introduced? Well there is nothing intrinsically secret about any 
integer so we should forget about associating secrecy with values. Instead, we 
associate secrecy with the origin of a value which in our case will be the free 
variables of a program. So each variable is either high (secret) or low (public). 
The idea is that any initial value of a high variable is assumed to be secret 
merely by virtue of being stored in a high variable. The initial value of a low 
variable is not secret. 

This origin- view of secrecy differs from the view held by others working with 
assorted lambda calculi and type systems for secrecy [1,3]. There, secrecy is asso- 
ciated with values like boolean constants. It does not seem sensible to attribute 
any level of security to such constants. After all, what exactly is a “high-security” 
boolean? Semantically, there is nothing that makes it high or low. Basic constants 
can be treated as high or low, and therefore we take the view that they should be 
typed polymorphically in any type system where levels of classification become 
(partially-ordered) types. 

We need to talk about secrecy violations. But what constitutes a violation? 
Suppose /c is a low variable and ft- is a high variable with initial value 17. Is 
the assignment k := 17 in violation of secrecy? Presumably not since it just got 
lucky and does not reliably reveal the value of ft as ft varies. On the other hand, 
k := h would be a violation. 

As another example, consider 

k := h; k := k — h 

Does it exhibit a violation? Despite the first assignment, we might still regard 
the composition as secure since ft is only temporarily stored in k which always 
has final value zero. One might wonder though whether even temporary storage 
is a violation. It would be if execution could be suspended for some reason, say 
in an interleaved execution environment, and fc’s contents inspected. For now, 
we shall stay with deterministic sequential programs and focus on what they are 
capable of doing upon normal termination. In this case, the composition would 
be secure. This also allows us to say that 

ft := ft; k := h 

too is secure since there is no way to update ft between the assignments. 
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One can begin to see the subtlety in deciding what constitutes a secrecy vio- 
lation. In the end, it comes down to what is observable by users and programs. 
Users can make external observations of running programs and system behav- 
ior on chosen inputs in order to learn secrets. Running time, resource usage, 
exceptions and so on are all valuable sources of information, provided by even 
well-designed programs, that can be observed outside a program and exploited. 
Programs, in contrast, make internal observations in that they are limited to 
whatever observations their semantics prescribe. Controlling these observations 
is much more tractable as long as implementations are faithful to the semantics^ 
and any program translation preserves the secrecy criterion of interest. With 
a semantics at least, we have a means of specifying and reasoning about the 
behaviors of programs and the observations they can make. We shall concern 
ourselves with internal observations only. This is still useful. For instance, it 
treats a Trojan Horse in mobile code that attempts to leak client secrets. 

So now that we have some intuition behind secrecy, how do we formalize it? 
There are a number of different techniques such as process calculi equivalence 
[2], a PER model [6], and operational formulations [8,9,10]. In order to contrast 
secrecy with safety, we give a trace-based description. It is useful to first define 
a notion of configuration equivalence. Memories /i and /x' are equivalent, written 
/i ~ jj! , if /x(u) = /i'(u) for all low variables v. And (c, /x) ~ (c', /x') if c and c' are 
syntactically equal and /x ~ /x'. 

Definition 2. Secrecy is a binary relation R on traces where R{a,a') is true 
unless a has the form mi — > m2 — > • • • — > n, a' has the form mj — > 
m '2 — > • • • — > /x', mi ~ m'l and /x 7^ /x'. A program is secret if R relates every 
pair of its traces. 

Basically, secrecy is asserting that the final value of any low variable does not 
depend on the initial values of high variables. This definition applies only to 
deterministic programs. Notice that a program may be secret even though it has 
a finite trace and an infinite trace whose starting configurations are equivalent. 
In other words, termination of a secret program can be affected by differences 
in the initial values of high variables. 

4 Contrasting secrecy with safety 

Notice that secrecy relates program executions whereas a safety property does 
not. This is the essential difference between them. There are some interesting 
consequences of this difference in terms of enforcing secrecy versus safety. 

Suppose we take the view that a program may be unsafe but we won’t worry 
about its offending traces unless one of them tries to emerge during the current 
execution. So we don’t try to convince ourselves once and for all that a program 
is safe. Instead we accept the fact it may be unsafe and put our trust in an 
execution monitor to guard against unsafe behavior. This is an old idea from 

^ Knowing when an implementation is faithful can also be tricky. 
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operating systems. A monitor works by monitoring the execution of a program 
and trapping it before it violates the policy being enforced [7]. It relies only on 
information available at runtime and does not examine the entire program being 
executed. Recovery from such traps may be possible in some applications. It is 
in these applications that monitoring is appealing because the complement of 
deciding whether a program is safe (call it unsafety) may be r.e. when safety is 
not. When safety is not r.e., we are immediately faced with incompleteness in 
any sound and r.e. logic for analyzing it. If the logic were complete then safety 
would be r.e. since we would have a way to accept safe programs: simply hand 
the given program off to the machine M accepting programs that have proofs in 
the logic. If M accepts then we accept and we know we’re correct because the 
logic is sound. And if the program is safe, then, by completeness, it has a proof 
and therefore M will accept it. Incompleteness can be an obstacle in practice, 
depending on the logic. Execution monitoring avoids it. 

Monitoring can also dovetail nicely with a machine M accepting unsafety. M 
might cycle through all memories (suitably encoded) and run a given program 
on each of them for at most some fixed number of steps, where the memories 
and number of steps are governed by pair generation. If the unsafe behavior 
reveals itself within the number of steps allowed (guaranteed to be detectable 
by M since the finite traces of a safety property form a recursive set), then M 
accepts. An execution monitor for unsafety is essentially a lazy version of M . 
Eventually the monitor might decide that a given program is unsafe but that 
does not concern us unless the current run demonstrates it. 

We need to be clear on the terms soundness and completeness. The monitor 
is a lazy version of M which accepts unsafety. Since M accepts unsafety, we 
have that if a program is unsafe then M will say so (completeness) and if M 
says a program is unsafe, it is indeed unsafe (soundness). Therefore, if M never 
says a program is unsafe then the program is safe. This we take as a soundness 
criterion for M (and the monitor) with respect to the safety property at hand. 
Likewise, if a program is safe, then M never says otherwise. And this we take as 
a completeness criterion for M relative to safety. 

A similar technique can be used to prove that the complement of deciding 
whether a program is secret is r.e.. One can encode a pair of memories and adopt 
some convention for determining values of low variables, and then run the given 
program for at most a fixed number of steps on each memory in a generated pair 
when the memories are equivalent. If the runs terminate yielding inequivalent 
memories, then accept. But unlike the complement of safety, the technique here 
does not dovetail with execution monitoring because it requires two memories. 
Monitoring involves only one, that of the current execution. So for this notion 
of secrecy, monitoring cannot be employed as a way to guard against secrecy 
violations as it was used to guard against safety violations. In fact, we can be 
more rigorous. As we shall see, one can prove there is no policy, implemented by 
an execution monitor, that implies secrecy and is complete. In contrast, there 
are many safety properties that have sound and complete execution monitors. 
So what alternatives are there for enforcing secrecy? 




308 D. Volpano 



One approach is to turn to a static analysis whereby we attempt to show 
once and for all that a given program is secret. But we will be faced with in- 
completeness in any sound and r.e. system for reasoning about secrecy because 
determining whether a program is secret is undecidable. Decidable type sys- 
tems fall into this category [10]. Instead, one may adopt a very expressive logic 
and use verification conditions for establishing secrecy without worrying about 
mechanizing proofs. Work along these lines is described in [4]. 

In the next section, we shall see an example of a program secrecy crite- 
rion implied by a policy that is implemented using an execution monitor. It is 
called weak secrecy. A disadvantage of weak secrecy is that it ignores indirect 
dependencies caused by branching — hence the term “weak”. As a result, some 
programs satisfy weak secrecy but are not secret. But there are also secrect pro- 
grams that do not satisfy weak secrecy, reflecting a basic requirement of safety. 
So neither property implies the other. The monitor is sound but incomplete for 
weak secrecy. It may trap a program that satisfies weak secrecy. 

5 Weak secrecy 

Every trace has a corresponding branch-free program formed by sequencing up- 
dates from those steps of the trace whose derivations are rooted with updates. 
For instance, if k,h G dom{^) and ^{h) = 0, then corresponding to the trace 

(fc := h; if h then k := \ else k := 0, n) (mi) 

— > (if h then k := 1 else k := 0, /x[fc := fJ^(h)]) (m 2 ) 

— > (k := 0, fx[k := fx(h)]) (m3) 

— > fi[k := n{h)][k := 0] (m4) 

is the branch-free program k := h; k := 0. Notice that by rules (loop) and 
(branch), the corresponding program for a trace may be empty. 

Now we say that a program is weakly secret if every trace of it has a secret 
branch-free program. For instance, the program in the preceding example is not 
weakly secret. Traces mi — > m2 and mi — > m2 — > m3 do not have secret 

branch-free programs, but mi — > m2 — > m3 — > m4 does. It may seem that 

we still have not defined a criterion for program secrecy that follows from some 
policy implemented by execution monitoring since we still cast our definition in 
terms of secrecy which relates program executions. But there is a policy that 
implies weak secrecy and it can be implemented by an execution monitor. 

5.1 A policy for weak secrecy and its monitor 

An execution monitor is given in Fig. 2 as a set of rules governing transitions 
that the monitor can make. Each transition has the form 

/ \ M / 

^ m,q 




Safety versus Secrecy 



309 



where q and q' are states {fc} or {k,h}. The monitor is equipped to handle 
executions of programs with only two variables, namely k and h, which are low 
and high variables respectively. A state indicates those variables whose values 
at that point are independent of initial values of h. This of course may change 
during execution depending upon updates to h. 

The policy is captured by the (update) rules. If we take state {k} to be the 
initial state, then the third (update) rule, for instance, allows a transition to 
state {k, h} because h is the target of the assignment and h does not occur in the 
right side e. Thereafter, h is treated as a low variable in state {k, h]. Notice that 



(update) 


k e dom{jj,), h ^ e 




{k := e, fi), {k} /i[fc := ^(e)], {k} 

h e dom{fi), h £ e 




{h := e, p), {k} fj,[h := ^(e)], {k} 

h £ dom{fi), h ^ e 




{h := e, fi), {k} /r[fe := ^(e)], {k, h} 

X £ dom{jj.) 


(sequence) 


(x := e, ij), {fc, h} n[x := p(e)], {k, h} 

(ci,p),g — ^ ,q 




(ci;C2,p),g 

(ci,p),g^(d,/i'),g' 


(branch) 


(ci;C2,p),g — > (c'i;C2,/i'),g' 
^^{e) / 0 




(if e then ci else C2, /i), q — > (ci, /i), q 
H{e) = 0 


(loop) 


(if e then ci else C2, /i), q (c2, /i), q 

fi{e) = 0 




(while e do c,fi),q — > fi,q 
fi{e) 7^ 0 




(while e do c,^),q (c; while e do c,^),q 




Fig. 2. An execution monitor 



once an evaluation reaches state {k, K\, it remains in {k, h\ thereafter. In state 
{fc, h\, the monitor no longer has any effect on executions. This is where the 
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semantics of Fig. 1 and the monitor merge in the sense that for every command 
c, (c, /i) — > TO if and only if (c, /r), {k, h} to, {fc, h}. 

Remark 1. One can think of the states in a transition as inherited and synthe- 
sized attributes. Generalizing the execution monitor to handle more variables 
can be done by introducing a set I of variables, each of whose value is in- 
dependent of the initial value of any high variable. There would actually be 
only two (update) rules. The first rule’s hypothesis would, for an assignment 
X := e, require V/i G high, h G I V h ^ e. Its synthesized attribute would be 
I U {x}. The second rule’s hypothesis would require that a; is a high variable and 
3h G high, h ^ I A h G e. Its synthesized attribute would be simply I. 

We can regard a set of traces of the monitor as a safety property related to 
secrecy by the following theorem: 

Theorem 1. Let a he a trace of the monitor starting in state {fc}. Then every 
finite prefix of a has a secret branch-free program. 

If the monitor never traps a given program on any input, when started in 
state {k}, then the program is weakly secret. However, a program may be weakly 
secret yet get trapped (e.g. k := h — h). The monitor also traps a secret program: 

k := h; k := k — h 

Here there is a trace whose branch-free program is just k := h which is not secret. 
One might consider altering the monitor in some way to admit all executions of 
this program but then its traces would no longer be prefix closed as a trace for 
k := h would not exist if the monitor’s policy implies secrecy. It follows then that 
there is no monitor-enforced policy that is sound and complete for secrecy since 
the set of all traces of every monitor is prefix closed. Simply put, if the monitor 
executes k := h, then it’s unsound, and if it doesn’t, then it’s incomplete. 

The monitor also ignores indirect dependencies. For instance, it does not trap 

if h then k := 1 else k := 0 

even though the program is not secret. 

6 Concluding remarks 

Execution monitoring has been a useful mechanism for implementing various 
policies. It is important to distinguish policies from properties. A policy implies 
a property, and in some cases, may be more restrictive than it needs to be in 
order to imply the property. The execution monitor presented here implements 
a policy that implies weak secrecy in the sense that if it never traps a given 
program on any input, when started in state {/c}, then the program is weakly 
secret. It does not however imply secrecy. In fact, no policy implemented by an 
execution monitor can imply secrecy and be complete. 
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An interesting direction to pursue is completeness of the monitor for weak 
secrecy, that is, trying to extend the monitor so that it never traps a weakly secret 
program. Doing this for a more realistic set of expressions would be challenging. 
We assumed that expressions are executed atomically and that the monitor can 
inspect an expression at runtime. But expressions obviously can be far more 
complex, involving function calls, conditional expressions, exceptions and side 
effects. One cannot assume these sorts of expressions execute atomically. 
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Abstract. This paper investigates bottom-up logic programming as a 
formalism for expressing static analyses. The main technical contribution 
consists of two meta-complexity theorems which allow, in many cases, 
the asymptotic running time of a bottom-up logic program to be deter- 
mined by inspection. It is well known that a datalog program runs in 
0(n^) time where k is the largest number of free variables in any single 
rule. The theorems given here are significantly more refined. A variety of 
algorithms given as bottom-up logic programs are analyzed as examples. 



1 Introduction 

This paper attempts to make four basic points concerning the use of bottom-up 
logic programming as a formalism for expressing static analysis algorithms. First, 
a bottom-up logic program is a set of inference rules and most static analysis 
algorithms have natural representations as inference rules. Second, bottom-up 
logic programs are machine-readable and can be automatically compiled into 
a running algorithm. Third, it is possible to prove meta-complexity theorems 
which allow, in many cases, the running time of a bottom-logic logic program 
to be determined by inspection. Fourth, it appears that most program analyses 
can be performed by natural bottom-up logic programs whose running time, as 
given by the meta-complexity theorems, is either the best known or within a 
poly log factor of the best known. 

We use the term “inference rule” to mean first order Horn clause, i.e. a first 
order formula of the form Ai A . . . A A„ ^ C where C and each Ai is a first order 
atom, i.e., a predicate applied to first order terms. First order Horn clauses form 
a Turing complete model of computation and can be used in practice as a general 
purpose programming language. The atoms Ai, . . ., A„ are called the antecedents 
of the rule and the atom C is called the conclusion. When using inference rules as 
a programming language one represents arbitrary data structures as first order 
terms. For example, one can represent terms of the lambda calculus or arbitrary 
formulas of first order logic as first order terms in the underlying programming 
language. The restriction to first order terms in no way rules out the construction 
of rules defining static analyses for higher order languages. 
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There are two basic ways to view a set of inference rules as an algorithm — the 
backward chaining approach taken in traditional Prolog interpreters [12,4] and 
the forward chaining, or bottom-up approach common in deductive databases 
[22,21,17]. Meta-complexity analysis derives from the bottom-up approach. As a 
simple example consider the rule P{x, y) A P{y, z) — > P{x, z) which states that 
the binary predicate P is transitive. Let H be a set of assertions of the form 
P{c, d) where c and d are constant symbols. More generally we will use the term 
assertion to mean a ground atom, i.e., atom not containing variable, and use the 
term database to mean a set of assertions. For any set R of inference rules and 
any database D we let R{D) denote the set of assertions that can be proved in 
the obvious way from assertions in D using rules in R. If R consists of the above 
rule for transitivity, and D consists of assertions of the form P(c, d), then R{D) 
is simply the transitive closure of P. In the bottom-up view a rule set R is taken 
to be an algorithm for computing output R{D) from input D. 

Here we are interested in methods for quickly determining the running time 
of a rule set R, i.e., the time required to compute R{D) from D. For exam- 
ple, consider the following “algorithm” for computing the transitive closure of 
a predicate EDGE defined by the bottom-up rules EDGE(a;,y) ^ PATH(x,y) and 
EDGE(x, y) A PATH(j/, z) PATH(x, z). If the input graph contains e edges this al- 
gorithm runs in 0(en) time — significantly better than 0{n^) for sparse graphs. 
Note that the 0{en) running time can not be derived by simply counting the 
number of variables in any single rule. Section 4 gives a meta-complexity theorem 
which applies to arbitrary rule sets and which allows the 0(en) running time of 
this algorithm to be determined by inspection. For this simple rule set the 0(en) 
running time may seem obvious, but examples are given throughout the paper 
where the meta-complexity theorem can be used in cases where a completely 
rigorous treatment of the running time of a rule set would be otherwise tedious. 
The meta-theorem proved in section 4 states that R{D) can be computed in time 
proportional to the number of “prefix firings” of the rules in i? — the number 
of derivable ground instances of prefixes of rule antecedents. This theorem holds 
for arbitrary rule sets, no matter how complex the antecedents or how many 
antecedents rules have, provided that every variable in the conclusion of a rule 
appears in some antecedent of that rule. 

Before presenting the first significant meta-complexity theorem in section 4, 
section 3 reviews a known meta-complexity theorem based on counting the num- 
ber of variables in a single rule. This can be used for “syntactically local” rule 
sets — ones in which every term in the conclusion of a rule appears in some 
antecedent. Some other basic properties of syntactically local rule sets are also 
mentioned briefly in section 3 such as the fact that syntactically local rule sets 
can express all and only polynomial time decidable term languages. 

Section 5 gives a series of examples of program analysis algorithms expressed 
as bottom-up logic programs. The first example is basic data flow. This algo- 
rithm computes a “dynamic transitive closure” — a transitive closure operation 
in which new edges are continually added to the underlying graph as the compu- 
tation proceeds. Many such dynamic transitive closure algorithms can be shown 
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to be 2NPDA-complete [10,16]. 2NPDA is the class of languages that can be 
recognized by a two-way nondeterministic pushdown automaton. A problem is 
2NPDA-complete if it is in the class 2NPDA and furthermore has the property 
that if it can be solved in sub-cubic time then any problem in 2NPDA can also be 
solved in sub-cubic time. No 2NPDA-complete problem is known to be solvable 
in sub-cubic time. Section 5 also presents a linear time sub-transitive data flow 
algorithm which can be applied to programs typable with non-recursive data 
types of bounded size and a combined control and data flow analysis algorithm 
for the A-calculus. In all these examples the meta-complexity theorem of sec- 
tion 4 allows the running time of the algorithm to be determined by inspection 
of the rule set. 

Section 6 presents the second main result of this paper — a meta-complexity 
theorem for an extended bottom-up programming language incorporating the 
union-find algorithm. Three basic applications of this meta-complexity theorem 
for union-find rules are presented in section 7 — a unification algorithm, a con- 
gruence closure algorithm, and type inference algorithm for the simply typed 
A-calculus. Section 8 presents Henglein’s quadratic time algorithm for typability 
in a version of the Abadi-Cardelli object calculus [1 1] . This example is interest- 
ing for two reasons. First, the algorithm is not obvious — the first published 
algorithm for this problem used an O(n^) dynamic transitive closure algorithm 
[18]. Second, Henglein’s presentation of the quadratic algorithm uses classical 
pseudo-code and is fairly complex. Here we show that the algorithm can be pre- 
sented naturally as a small set of inference rules whose 0{n?) running time is 
easily derived from the union-find meta-complexity theorem. 

2 Terminology and Assumptions 

As mentioned in the introduction, we will use the term assertion to mean a 
ground atom, i.e., atom not containing variable, and use the term database to 
mean a set of assertions. Also as mentioned in the introduction, for any rule set 
R and database D we let R{D) be the set of ground assertions derivable from 
D using rules in R. We write D h/j ^ as an alternative notation for G R{D). 
We use 1 1? I for the number of assertions in D and ||iA|| for the number of distinct 
ground terms appearing either as arguments to predicates in assertions in D or 
as subterms of such arguments. 

A ground substitution is a mapping from a finite set of variables to ground 
terms. In this paper we consider only ground substitutions. If cr is a ground 
substitution defined on all the variables occurring a term t the a{t) is defined in 
the standard way as the result of replacing each variable by its image under cr. 
We also assume that all expressions — both terms and atoms — are represented 
as interned dag data structures. This means that the same term is always rep- 
resented by the same pointer to memory so that equality testing is a unit time 
operation. Furthermore, we assume that hash table operations take unit time 
so that for any substitution cr defined (only) on x and y we can compute (the 
pointer representing) a(f(x, y)) in unit time. Note that interned expressions 
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support indexing. For example, given a binary predicate P we can index all as- 
sertions of the form P{t, w) so that the data structure representing t points to a 
list of all terms w such that P{t, w) has been asserted and, conversely, all terms 
w point to a list of all terms t such that P{t,w) has been asserted. 

We are concerned here with rules which are written with the intention of 
defining bottom-up algorithms. Intuitively, in a bottom-up logic program any 
variable in the conclusion that does not appear in any antecedent is “unbound” 
— it will not have any assigned value when the rule runs. Although unbound 
variables in conclusions do have a well defined semantics — any value is ac- 
ceptable — and although it is possible to define an interpreter which correctly 
handles such unbound variables, when writing rules to be used in a bottom-up 
way it is always possible to avoid such variables. A rule in which all variables 
in the conclusion appear in some antecedent will be called hottom-up bound. In 
this paper we consider only bottom-up bound inference rules. 

A datalog rule is one that does not contain terms other than variables. A 
syntactically local rule is one in which every term in the conclusion appears 
in some antecedent — either as an argument to a predicate or as a subterm 
of such an argument. Every syntactically local rule is bottom-up bound and 
every bottom-up bound datalog rule is syntactically local. However, the rule 
P{x) — > P{s{x)) is bottom-up bound but not syntactically local. Note that the 
converse, P{s{x)) P{x) is syntactically local. 

3 Local Rule Sets 

Before giving the main results of this paper, which apply to arbitrary rule sets, 
we give a first “naive” meta-complexity theorem. This theorem applies only to 
syntactically local rule sets. Because every term in the conclusion of a syntacti- 
cally local rule appears in some antecedent, it follows that a syntactically local 
rule can never introduce a new term. This implies that if R is syntactically local 
then for any database D we have that R{D) is finite. More precisely, we have 
the following. 

Theorem 1. If R is syntactically local then R{D) can he computed in 0{\D\ + 
||Z?||^) time where k is the largest number of variables occurring any single rule. 

To prove the theorem one simply notes that it suffices to consider the set of 
ground horn clauses consisting of the assertions in D (as unit clauses) plus all 
instances of the rules in R in which all terms appear in D. There are 0(||D||*) 
such instances. Computing the inferential closure of a set of ground clauses can 
be done in linear time [5]. 

As the 0{en) transitive closure example in the introduction shows, theorem 1 
provides only a crude upper bound on the running time of inference rules. Before 
presenting the second meta-complexity theorem, however, we briefly mention 
some addition properties of local rule sets that are not used in the remainder of 
the paper but are included here for the sake of completeness. The first property 
is that syntactically local rule sets capture the complexity class V. We say that 
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a rule set R accepts a term t if INPUT(t) h/j ACCEPT(t). The above theorem 
implies that the language accepted by a syntactically local rule set is polynomial 
time decidable. The following less trivial theorem is proved in [7]. It states the 
converse — any polynomial time property of first-order terms can be encoded 
as a syntactically local rule set. 

Theorem 2 (Givan & McAllester). If £ is a polynomial time decidable term 
language then there exists a syntactically local rule set which accepts exactly the 
terms in £. 

The second subject we mention briefly is what we will call here semantic 
locality. A rule set R will be called semantically local if whenever D r ^ 
there exists a derivation of from assertions in D using rules in R such that 
every term in that derivation appears in D. Every syntactically local rule set 
is semantically local. By the same reasoning used to prove theorem 1, if i? is 
semantically local then R{D) can be computed in 0{\D\ + ||7?||^) time where k 
is the largest number of variables in any single rule. In many cases it possible to 
mechanically show that a given rule set is semantically local even though it is not 
syntactically local [14,2]. However, semantic locality is in general an undecidable 
property of rule sets [7]. 

4 A Second Meta-Complexity Theorem 

We now prove our second meta-complexity theorem. We will say that a data 
base E is closed under a rule set R ii R{E) = E . It would seem that determining 
closedness would be easier than computing the closure in cases where we are not 
yet closed. The meta-complexity theorem states, in essence, that the closure can 
be computed quickly — it can be computed in the time needed to merely check 
the closedness of the final result. Consider a rule AiA. . .AA„ ^ C. To check that 
a data base E is closed under this rule one can compute all ground substitutions 
a such that cr(Ai), . . . cr(A„) are all in E and then check that cr(C') is also in E. 
To find all such substitutions we can first match the pattern Ai against assertions 
in the data base to get all substitutions cti such that cri(Ai) e D. Then given 
CTi such that ai(Ai), . . ., ai{Ai) are all in E we can match cri(A+i) against the 
assertions in the data base to get all extensions such that Oi -i-i(4i), . . ., 
CTi+i(Ai_|_i) are in E. This method of computing the antecedent substitutions 
involves at least one step of computation for each substitution Ui on the free 
variables of Ai, . . ., A such that ai(Ai), . . ., ai{Ai) are all in E. Each such ai 
determines a “prefix firing” of the rule as defined below. 

Definition 1. We define a prefix firing of a rule Ai, . . . , A„ ^ C in a rule set 
R under data base E to be a ground instance Bi, . . . , Bi of an initial sequence 
Ai, . . . , Ai, i < n, such that Bi, . . B„ are all contained in D. We let Pr{E) 
be the set of all prefix firings of rules in R for data base E. 

Note that the rule P{x,y) A P{y,z) A R{z) — > P{x,z) might have a large 
number of firings for the first two antecedents while having no firings of all three 
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antecedents. The simple algorithm outlined above for checking that E is closed 
under R requires at least \Pn{E)\ steps of computation. As outlined above, the 
closure check algorithm would actually require more time because each step of 
extending to involves iterating over the entire data base. The following 
theorem states, essentially, that we can compute R{D) in the time it would take 
to check that R{D) is closed under R even if someone else computed R{D) and 
even ignoring the iteration over the data base used to compute (Ti+i from 

Theorem 3 . For any set R of bottom-up hound inference rules there exists an 
algorithm for mapping D to R{D) which runs in 0 {\D\ + |P/j(i?(£))) |) time. 

Proof. The proof is based on a source to source transformation of the given 
program. We note that each of the following source to source transformations 
on inference rules preserve the quantity \D\ + \Pn{R{D))\ (as a function of D) 
up to a multiplicative constant. In the second transformation note that there 
must be at least one element of D or Pr{R{D)) for each assertion in R{D). 
Hence adding any rule with only a single antecedent and with a fresh predicate 
in the conclusion at most doubles the value of \D\ + \Pn{R{D))\. The second 
transformation can then be done in two steps — first we add the new rule and 
then replace the antecedent in the existing rule. A similar analysis holds for the 
third transformation. 

Al, A2, A3, ...An — > C 

Al, A2 ^ H(xi, . . . , Xnf}^ • j ^n), A3, . . . , An ^ C 

where . . . , are all free variables in Ai and A2. 

Al, ..., P{t\^ ■■■5 In)^ ■•■5 An ^ C 

P{tij . . . , tn^ ^ Q(^Xi^ ■ ■ ■ 5 1 Al, ..., Q(xi, ..., Xm ) , . . . , An ^ C 

where at least one of ti is a non- variable and Xi, . . . , Xm are all the free variables 
in ti , . . . , tn. 

H(xi, . . . , Xn)j QiVlj ■ ■ ■ j Vm) ^ C 

P(xi, . . . , Xn)^ P'{f{zi, ■ . ■ , Zk), g{wi, . . . Wh)) 

Qivi, ■■■, Vn) ^ Q'{g{wi, ..., Wh), u{vi, . . . Vj)) 

P'{x,y), Q'(y,z) R{x,y,z) 

i?(/(xi, . . . , Xn), g{wi, ..., Wh), u{vi, ..., Vm)) C 

where Z\, . . . , Zk are those variables among the x^s which are not among the 
yiS; wi, ..., Wh are those variables that occur both in the x^s and j/jS; and 
vi, ... Vi are those variables among the yiS that are not among the x^s. 

These transformations allow us to assume without loss of generality that the 
only multiple antecedent rules are of the form P{x, y),Q{y, z) — > R(x, y, z). 
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For each such multiple antecedent rule we create an index such that for each 
y we can enumerate the values of x such that P{x, y) has been asserted and 
also enumerate the values of 2 such that Q{y, z) has been asserted. When a 
new assertion of the form P{x, y) or Q{y, z) is derived we can now iterate over 
the possible values of the missing variable in time proportional to the number 
of such values. 

As a first application of theorem 3 consider the transitive closure algo- 
rithm defined by the inference rules EDGE(x, y) PATH(x, y) and EDGE(a:, y) A 
PATH(j/, z) PATH(a;, z). If R consists of these two rules and D consists of e as- 
sertions of the form EDGE(c, d) involving n constants then we immediately have 
that \Pr{R{D))\ is 0{en). So theorem 3 immediately implies that the algorithm 
runs in 0{en) time. 



U ^ a 

INPUT(CONS(a, j)) 



PARSESCF, CONSCa, j) , j) 



BC 

PARSES (B, i, j) 
PARSES (C, j, k) 



PARSESCA, i, k) 



INPUT (CONS (X, Y)) 



INPUT (Y) 



Fig. 1. The Cocke-Kasimi- Younger (CKY) parsing algorithm. PARSES (u, i, j) means 
that the substring from i to j parses as nonterminal u. 



As a second example, consider the algorithm for context free parsing shown 
in figure 1. The grammar is given in Chomsky normal form and consists of a set 
of assertions of the form X ^ a and X ^ YZ. The input sting is represented as 
a “lisp list” of the form C0NS(ai, C0NS(o2, ... C0NS(a„, NIL))) and the input 
string is specified by an assertion of the form INPUT(s). Let g be the number 
of productions in the grammar and let n be the length of the input string. 
Theorem 3 immediately implies that this algorithm runs in O(gn^) time. Note 
that there is a rule with six variables — three string index variables and three 
grammar nonterminal variables. 



5 Basic Examples 

Figure 2 gives a simple first-order data flow analysis algorithm. The algorithm 
takes as input a set of assignment statements of the form ASSIGN(a;, e) where 
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a; is a program variable and e is a either a “constant expression” of the form 
CONSTANT(n), a tuple expression of the form {y, z) where y and z are program 
variables, or a projection expression of the form or Il 2 {y) where y is a 

program variable. Consider a database D containing e assignment assertions 
involving n program variables and pair expressions. Clearly the first rule (upper 
left corner) has at most e firings. The transitivity rule has at most firings. 
The other two rules have at most en firings. Since e is O(n^), theorem 3 implies 
that the algorithm given in figure 2 runs in O(n^) time. 

It is possible to show that determining whether a given value can reach a 
given variable, as defined by the rules in figure 2, is 2NPDA complete [10,16]. 
2NPDA is the class of languages recognizable by a two-way nondeterministic 
pushdown automaton. A language £ will be called 2NPDA-hard if any problem 
in 2NPDA can be reduced to £ in n polylog n time. We say that a problem can be 
solved in sub-cubic time if it can be solved in O(n^) time for A: < 3. If a 2NPDA- 
hard problem can be solved in sub-cubic time then all problems in 2NPDA can 
be solved in sub-cubic time. The data flow problem is 2NPDA-complete in the 
sense that it is in the class 2NPDA and is 2NPDA-hard. 

Cubic time is impractical for many applications. If the problem is changed 
slightly so as to require that the assignment statements are well typed using 
types of a bounded size, then the problem of determining if a given value can 
reach a given variable can be solved in linear time. This can be done with sub- 
transitive data flow analysis [9] . In the first-order setting of the rules in figure 2 
we use the types defined by the following grammar. 

T ::= INT | (t, r) 

Note that this grammar does not allow for recursive types. The linear time 
analysis can be extended to handle list types and recursive types but giving 
an analysis weaker than that of figure 2. For simplicity we will avoid recursive 
types here. We now consider a database containing assignment statements such 
as those described above but subject to the constraint that it must be possible 
to assign every variable a type such that every assignment is well typed. For 
example, if the data base contains ASSIGN(x, {y,z)) then x must have type 
(t, a) where r and cr are the types of y and z respectively. Similarly, if the 
database contains ASSIGN(y, IIi{x)) then x must have a type of the form (t, a) 
where y has type r. Under these assumptions we can use the inference rules 
given in figure 3. 

Note that the rules in figure 3 are not syntactically local. The inference rule 
at the lower right contains a term in the conclusion, namely Ujie^), which is 
not contained in any antecedent. This rules does introduce new terms. However, 
it is not difficult to see that the rules maintain the invariant that for every 
derived assertion of the form e\ we have that ei and 62 have the same 

type. This implies that every newly introduced term must be well typed. For 
example, if the rules construct the expression ili( 772 (x)) then x must have a 
type of the form (r, (cr, rj)). Since the type expressions are finite, there are only 
finitely many such well typed terms. So the inference process must terminate. 
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ASSIGN(a;, CONSTANT(n)) ASSIGN(y, IIj{x)) 

X ^ {zi, Zi) 

X <= CDNSTANT(n) 

y^Zj 

ASSIGN(a;, {y, z)) 

U z= W, W <= V 

X <= {y, z) 

U z= V 



Fig. 2. A data flow analysis algorithm. The rule involving Uj is an abbreviation for 
two rules — one with Ui and one with 772. 



ASSIGN(a:, CONSTANT(n)) 
X CDNSTANT(n) 

ASSIGN(a;, {y, z)) 

IIi{x) y, Il2{x) <J= 2 

ASSIGN(y, IIj{x)) 

y^ nj{x) 



e Uj (x) 

C 0 MPUTE- 7 Ij (x) 

ei 62, COMPUTE-JIj(ei) 
i 7 j(ei) 77^(62) 



Fig. 3. Sub-transitive data flow analysis. A rule with multiple conclusions represents 
multiple rules — one for each conclusion. 



S 0 URCE(a:) 



z <i=y, REACHES (j/) 



REACHES(a;) 



REACHES (2) 



Fig. 4. Determining the existence of a path from a given source. 
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INPUT((/ w)) 

INPUT(/), INPUT(w) 

INPUT(Aa;.e) 

INPUT(e), Aa;.e ^ Xx.e 

INPUT((ei, 62)) 

INPUT(ei), INPUT(e2), (ei, 62) ^ (ei, 62 ) 

INPUT( 77 j(m)) 

INPUT(m) 



INPUT((/ w)) 
f —> \x.u 

X —> w, {f w) —> u 

INPUT( 77 j(u)) 
w ^ (ei, 62) 

U ^ W, W ^ V 
U ^ V 



Fig. 5. A flow analysis algorithm for the A-Calcnlus with pairing. The rnles are intended 
to be applied to an initial database containing a single assertion of the form INPUT(e) 
where e is a closed A-calculus term which has been a-renamed so that distinct bound 
variables have distinct names. Note that the rules rules are syntactically local — every 
term in a conclusion appears in some antecedent. Hence all terms in derived assertions 
are subterms of the input term. The rules compute a directed graph on the subterms 
of the input. 



In fact if no variable has a type involving more than b syntax nodes then the 
inference process terminates in linear time. To see this it suffices to observe that 
the rules maintain the invariant that for every derived assertion involving < 1 = is 
of the form iTji(. . . i7j„(ei)) < 1 = . . nj„{e 2 )) where the assertion ei < 1 = 62 is 

derived directly from an assignment using one of the rules on the left hand side 
of the figure. If the type of x has only b syntax nodes then an input assignment 
of the form ASSIGN(a;, e) can lead to at most b derived < 1 = assertion. So if there 
are n assignments in the input data base then there are at most bn derived 
assertions involving <l=. It is now easy to check that each inference rule has at 
most bn firings. So by theorem 3 we have that the algorithm runs in 0(n) time. 

It is possible to show that these rules construct a directed graph whose tran- 
sitive closure includes the graph constructed by the rules in figure 2. So to deter- 
mine if a given source value flows to a given variable we need simply determine 
if there is a path from the source to the variable. It is well known that one can 
determine in linear time whether a path exists from a given source node to any 
other node in a directed graph. However, we can also note that this computation 
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can be done with the algorithm shown in figure 4. The fact that the algorithm 
in figure 4 runs in linear time is guaranteed by Theorem 3. 

As another example, figure 5 gives an algorithm for both control and data 
flow in the A-calculus extended with pairing and projection operations. These 
rules implement set a form of set based analysis [1,8]. The rules can also be used 
to determine if the given term is typable by recursive types with function, pair- 
ing, and union types [15] using arguments similar to those relating control flow 
analysis to partial types [13,19]. A detailed discussion of the precise relationship 
between the rules in figure 5, set based analysis, and recursive types is beyond 
the scope of this paper. Here we are primarily concerned with the complexity 
analysis of the algorithm. All rules other than the transitivity rule have at most 
prefix firings and the transitivity rule has at most firings. Hence theorem 3 
implies that the algorithm runs in O(n^) time. 

It is possible to give a sub-transitive flow algorithm analogous to the rules 
in figures 5 which runs in linear time under the assumption that the input 
expression is well typed and that every type expression has bounded size [9]. 
However, the sub-transitive version of figure 5 is beyond the scope of this paper. 

6 Algorithms Based on Union-Find 

A variety of program analysis algorithms exploit equality. Perhaps the most 
fundamental use of equality in program analysis is the use of unification in type 
inference for simple types. Other examples include the nearly linear time flow 
analysis algorithm of Bondorf and Jorgensen [3], the quadratic type inference 
algorithm for an Abadi-Cardelli object calculus given by Henglein [11], and the 
dramatically improvement in empirical performance due to equality reported by 
Fahndrich et al. in [6]. Here we formulate a general approach to the incorporation 
of union-find methods into algorithms defined by bottom-up inference rules. In 
this section we give a general meta-complexity theorem for such union find rule 
sets. 

We let UNION, FIND, and MERGE be three distinguished binary predicate sym- 
bols. The predicate UNION can appear in rule conclusions but not in rule an- 
tecedents. The predicates FIND and MERGE can appear in rule antecedents but 
not in rule conclusions. A bottom-up bound rule set satisfying these conven- 
tions will be called a union- find rule set. Intuitively, an assertion of the form 
UNI0N(m, w) in the conclusion of a rule means that u and w should be made 
equivalent. An assertion of the form MERGE(m, w) means that at some point a 
union operation was applied to u and w and, at the time of that union operation, 
u and w were not equivalent. An assertion FIND(rt, /) means that at some point 
the find of u was the value /. 

For any given database we define the merge graph the undirected graph 
containing an edge between s and w if either MERGE(s, w) or MERGE(w, s) is in 
the database. If there is a path from s to w in the merge graph then we say 
that s and w are equivalent. We say that a database is union- find consistent if 
for every term s whose equivalence class contains at least two members there 
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exists a unique term / such that for every term w in the equivalence class of s the 
database contains FIND(w, /). This unique term is called the find of s. Note that 
a database not containing any MERGE or FIND assertions is union-find consistent. 
We now define the result of performing a union operation on the terms s and t in 
a union-find consistent database. If s and t are already equivalent then the union 
operation has no effect. If s and t are not equivalent then the union operation 
adds the assertion MERGE(s, t) plus all assertions of the form FIND(rt;, /) where 
w is equivalent to either s or t and / is the find of the larger equivalence class 
if either equivalence class contains more than one member — otherwise / is the 
term t. The fact that the find value is the second argument if both equivalence 
classes are singleton is significant for the complexity analysis of the unification 
and congruence-closure algorithms. Note that if either class contains more than 
one member, and w is in the larger class, then the assertion FIND(rc, /) does not 
need to be added. With appropriate indexing the union operation can be run in 
time proportional to number of new assertions added, i.e., the size of the smaller 
equivalence class. Also note that whenever the find value of term changes the 
size of the equivalence class of that term at least doubles. This implies that for a 
given term s the number of terms / such that E contains FIND(s, /) is at most 
log (base 2) of the size of the equivalence class of s. 

Of course in practice one should erase obsolete find assertions so that at any 
one time for any term s there is at most one assertion of the form FIND(s, /). 
However, because find assertions can generate conclusions before they are erased, 
the erasure process does not improve the bound given in theorem 4 below. In 
fact, such erasure makes the theorem more difficult to state. In order to allow 
for a relatively simply meta-complexity theorem we do not erase obsolete FIND 
assertions. 

We define an clean data base to be one not containing MERGE or FIND as- 
sertions. Given a union-find rule set R and a clean database D we say that a 
database E is an i?-closure of H if if can be derived from D by repeatedly ap- 
plying rules in i? — including rules that result in union operations — and no 
further applications of rules in R changes E. Unlike the case of traditional in- 
ference rules, a union-find rule set can have many possible closures — the set of 
derived assertions depends on the order in which the rules are used. For example 
if we derive the three union operations UNI0N(u, w), UNI0N(s, w), and UNI0N(m, s) 
then the merge graph will contain only two arcs and the graph depends on the 
order in which the union operations are done. If rules are used to derived other 
assertions from the MERGE assertions then arbitrary relations can depend on the 
order of inference. For most algorithms, however, the correctness analysis and 
running time analysis can be done independently of the order in which the rules 
are run. We now present a general meta-complexity theorem for union-find rule 
sets. 



Theorem 4. For any union-find rule set R there exists an algorithm mapping D 
to an R-closure of D, denoted as R{D), that runs in time 0{\D\ \Pn{R{D))\ -f 

|U(i?(ZI))|) where F{R{D)) is the set of FIND assertions in R{D). 
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The proof is essentially identical to the proof of theorem 3. The same source- 
to-source transformation is applied to R to show that without loss of gen- 
erality we need only consider single antecedent rules plus rules of the form 
P{x,y) A Q{y,z) R{x,y,z) where x, y, and 2 ; are variables and P, Q, and 
R are predicates other than UNION, FIND, or MERGE. For all the rules that do not 
have a UNION assertion in their conclusion the argument is the same as before. 
Rules with union operations in the conclusion are handled using the union op- 
eration which has unit cost for each prefix firing leading to a redundant union 
operation and where the cost of a non-redundant operation is proportional to 
the number of new FIND assertions added. 

7 Basic Union-Find Examples 

Figure 6 gives a unification algorithm. The essence of the unification problem 
is that if a pair (s, t) is unified with {u, w) then one must recursively unify s 
with u and t with w. The rules guarantee that if (s, t) is equivalent to {u, w) 
then s and u are both equivalent to the term iTi(/) where / is the common 
find of the two pairs. Similarly, t and w must also be equivalent. So the rules 
compute the appropriate equivalence relation for unification. However, the rules 
do not detect clashes or occurs-check failures. This can be done by performing 
appropriate linear-time computations on the final find map. 

To analyze the running time of the rules in figure 6 we first note that the rules 
maintain the invariant that all find values are terms appearing in the input prob- 
lem. This implies that every union operation is either of the form UNI0N(s, w) 
or UNI0N(7Ti(t(;), s) where s and w appear in input problem. Let n be the num- 
ber of distinct terms appearing in the input. We now have that there are only 
0{n) terms involved in the equivalence relation defined by the merge graph. 
For a given term s the number of assertions of the form FIND(s, /) is at most 
the log (base 2) of the size of the equivalence class of s. So we now have that 
there are only 0(n log n) FIND assertions in the closure. This implies that there 
are only O(nlogn) prefix firings. Theorem 4 now implies that the closure can 



EQUATE !(a;, y) 



FIND((rr, y),f) 



UNI0N(®, y) 



UNI0N(7Ii(/), x), UNI0N(7l2(/), y) 



Fig. 6. A unification algorithm. The algorithm operates on “simple terms” defined 
to be either a constant, a variable, or a pair of simple terms. The input database is 
assumed to be a set of assertions of the form EQUATE! (s, w) where s and w are simple 
terms. The rules generate the appropriate equivalence relation for unihcation but do 
not generate clashes or occurs-check failures (see the text). 




On the Complexity Analysis of Static Analyses 325 



EQUATE !(a;, y) 
INPUT(x), INPUT(y) 

INPUT((x, y)) 
INPUT(x), INPUT(i/) 

EQUATE !(a;, y) 
UNI0N(*, y) 



INPUT(a;) 

ID-OR-FIND(a;, x) 

FIND(a:, y) 
ID-OR-FIND(a;, y) 

INPUT((a;, y)) 
ID-OR-FIND(a;, x') 
ID-OR-FIND(j/, y’) 

UNI0N((a;', y'), {x, y)) 



Fig. 7. A congrnence closnre algorithm. The input data base is assumed to consist of 
a set of assertions of the form EQUATE !(s, w) and INPUT(s) where s and w are simple 
terms (as defined in the caption for figure 6). 



be computed in 0(n log n) time. The best known unification algorithm runs in 
0{n) time [20] and the best on-line unification algorithm runs in 0{na{n)) time 
where a. is the inverse of Ackermann’s function. The application of theorem 4 to 
the rules of figure 6 yields a slightly worse running time for what is, perhaps, a 
simpler presentation. 

Now we consider the congruence closure algorithm given in figure 7. First 
we consider its correctness. The fundamental property of congruence closure is 
that if s is equivalent to s' and t is equivalent to t' and the pairs (s, t) and 
{s', t') both appear in the input, then (s, t) should be equivalent to (s', t'). 
This fundamental property is guaranteed by the lower right hand rule in figure 7. 
This rule guarantees that if (s, t) and (s', t') both occur in the input and s is 
equivalent to s' and t to t' then both (s, t) and (s', t') are equivalent to (/i, / 2 ) 
where fi is the common find of s and s' and /2 is the common find of t and t' . 
So the algorithm computes the congruence closure equivalence relation. 

To analyze the complexity of the rules in figure 7 we first note that, as in 
the case of unification, the rules maintain the invariant that every find value is 
an input term. Given this, one can see that all terms involved in the equivalence 
relation are either input terms or pairs of input terms. This implies that there 
are at most 0{n^) terms involved in the equivalence relation where n is the 
number of distinct terms in the input. So we have that for any given term s 
the number of assertions of the form FIND(s, /) is O(logn). So the number of 
firings of the congruence rule is 0(n log^ n). But this implies that the number of 
terms involved in the equivalence relation is actually only 0(nlog^ n). Since each 
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INPUT((/, u)) INPUT((/, «)) 



INPUT(/), INPUT(m) UNI0N(DQM(TYPE(/)), TYPE(m)) 

UNION(RAN(TYPE(/)), TYPE((/ m))) 

INPUT(Aa;.it) 

INPUT(Ax.w) 

INPUT(m) 

UNION(TYPE(a:), DOM(TYPE(Aa:.M))) 
UNION(TYPE(m), RAN(TYPE(Ax.m))) 



Fig. 8. Type inference for simple types. The input database is assumed to consist of a 
single assertion of the form INPUT (e) where e is closed term of the pure A-calculus and 
where distinct bound variables have been a-renamed to have distinct names. As in the 
case of the unification algorithm, these rules only construct the appropriate equivalence 
relation on types. An occurs-check on the resulting equivalence relation must be done 
elsewhere. 



such term can appear in the left hand side of at most O(logn) FIND assertions, 
there can be at most 0(n log^n) find assertions. Theorem 4 now implies that 
the closure can be computed in 0(n log^ n) time. It is possible to show that by 
erasing obsolete FIND assertions the algorithm can be made to run in 0(n log n) 
time — the best known running time for congruence closure. 

We leave it to the reader to verify that the inference rules in figure 8 define 
the appropriate equivalence relation on the types of the program expressions and 
that the types can be constructed in linear time from the find relation output 
by the procedure. It is clear that the inference rules generate only 0(n) union 
operations and hence the closure can be computed in O(nlogn) time. 

8 Henglein’s Quadratic Algorithm 

We now consider Henglein’s quadratic time algorithm for determining typabil- 
ity in a variant of the Abadi-Cardelli object calculus [11]. This algorithm is 
interesting because the first algorithm published for the problem was a classical 
dynamic transitive closure algorithm requiring O(n^) time [18] and because Hen- 
glein’s presentation of the quadratic algorithm is given in classical pseudo-code 
and is fairly complex. 

A simple union- find rule set for Henglein’s algorithm is given in figure 9. 
The algorithm checks for the consistency of a set of type constraints. Types are 
defined by the grammar a ::= a \ \i\ = ui = cr„] where a represents 
a type variable and ii yf ij for i ^ j. Intuitively, an object o has type [£i = 
= cr„] if it provides a slot (or field) for each slot name £i and for 
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T < a 



a ^ T, r ^ 7 



T ^ a, T ^ a 



CT ^ 7 



EQUAL(r, a) 
UNI0N(r, a) 

MERGE(r, a) 



ACCEPTS (r, t) 
ACCEPTS (cr, t) 
a T 



UNI0N(a.€, r.i) 



T ^ a, r => (T 
<j ^ T, a ^ T 



Fig. 9. Henglein’s type inference algorithm. 



each such slot name we have that the slot value o.li of o for slot £i has type <7^. 
The rules in figure 9 assume that the input has been preprocessed to include 
all assertions of the form ACCEPTS([£i = Ui; ■■■;£„ = o'n], £i) and EQUAL([£i = 
ai;...;in = ai) with 1 < i < n. Note that this pre-processing can be 

done in linear time. The antecedent £i yf £2 means that £\ and £2 are distinct 
terms. Although this is outside of the normal notion of first-order Horn clause 
it is not difficult to show that theorems 3 and 4 still hold in the presence of 
such antecedents. Quadratic time inference is possible because the type system 
is “invariant”: if = ai; . . . ; = <t„] < [mi = ti; . . . ; m^ = then we must 

have that each rrii is equal to some £j where aj equals Ti. This property justifies 
the final rule (lower right) in figure 9. A system of constraints is rejected if the 
final database contains ACCEPTS(ct, £) and t ^ a, but not ACCEPTS(r, £). 

To analyze the complexity of the algorithm in figure 9 note that all terms 
involved in the equivalence relation are type expressions appearing in the pro- 
cessed input — each such expression is either a type expression of the original 
unprocessed input or of the form a.£ where a is in the original input and £ is a 
slot name appearing at the top level of a. Let n be the number assertions in the 
processed input. Note that the pre-processing guarantees that there is at least 
one input assertion for each type expression so the number of type expressions 
appearing in the input is also 0(n). Since there are 0(n) terms involved in the 
equivalence relation the rules can generate at most 0{n) MERGE assertions. This 
implies that the rules generate only 0{n) assertions of the form a => t. This 
implies that the number of prefix firings is 0{n^). Since there are 0(n) terms 
involved in the equivalence relation there are O(nlogn) FIND assertions in the 
closure. Theorem 4 now implies that the running time is 0(n^+nlogn) = 0(n^). 
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9 Conclusions 

This paper has argued that bottom-up logic programming is a natural formal- 
ism for expressing program analysis algorithms. The paper can be summarized 
with four points. First, a bottom-up logic program is a set of inference rules 
and most static analysis algorithms have natural representations as inference 
rules. Second, bottom-up logic programs are machine-readable and can be au- 
tomatically compiled into a running algorithm. Third, theorems 3 and 4 often 
allow the running time of a bottom-logic logic program to be determined by 
inspection. Fourth, it appears that most program analyses can be performed by 
natural bottom-up logic programs whose running time is either the best known 
or within a polylog factor of the best known. 

In the case of unification and Henglein’s algorithm final checks were per- 
formed by a post-processing pass. It is possible to extend the logic programming 
language in ways that allow more algorithms to be fully expressed as rules. Strati- 
fied negation by failure would allow a natural way of inferring NOT(ACCEPTS(cr, £)) 
in Henglein’s algorithm while preserving the truth of theorems 3 and 4. This 
would allow the acceptability check to be done with rules. A simple extension 
of the union-find formalism would allow the detection of an equivalence between 
distinct “constants” and hence allow the rules for unification to detect clashes. 
It might also be possible to extend the language to improve the running time for 
cycle detection and strongly connected component analysis for directed graphs. 

Another direction for further work involves aggregation. It would be nice 
to have language features and meta-complexity theorems allowing natural and 
efficient renderings of Dijkstra’s shortest path algorithm and the inside algorithm 
for computing the probability of a given string in a probabilistic context free 
grammar. 

The main conclusion is that, for many algorithms, bottom-up logic program 
presentations are clearer and simpler to analyze, both for correctness and for 
complexity, than classical pseudo-code presentations. 
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Abstract. In the past two decades, model-checking has emerged as 
a promising and powerful approach to fully automatic verification of 
hardware systems. But model checking technology can be usefully ap- 
plied to other application areas, and this article provides fundamentals 
that a practitioner can use to translate verification problems into model- 
checking questions. A taxonomy of the notions of “model,” “property,” 
and “model checking” are presented, and three standard model-checking 
approaches are described and applied to examples. 

1 Introduction 

In the last two decades model-checking [11,34] has emerged as a promising and 
powerful approach to automatic verification of systems. Roughly speaking, a 
model checker is a procedure that decides whether a given structure M is a, 
model of a logical formula 4>, i.e. whether M satisfies </>, abbreviated M \= (p. 
Intuitively, M is an (abstract) model of the system in question, typically a finite 
automata-like structure, and <f>, typically drawn from a temporal or modal logic, 
specifies a desirable property. The model-checker then provides a push-button 
approach for proving that the system modeled by M enjoys this property. This 
full automation together with the fact that efficient model-checkers can be con- 
structed for powerful logics, forms the attractiveness of model-checking. 

The above “generic” description of model-checking leaves room for refine- 
ment. What exactly is a model to be checked? What kind of formulas are used? 
What is the precise interpretation of satisfaction, |=? We present a rough map 
over the various answers to these questions, and in the process, we introduce the 
main approaches. 

The various model-checking approaches provide a cornucopia of generic deci- 
sion procedures that can be applied to scenarios that go far beyond the problem 
domains for which the approaches were originally invented. (The work of some of 
the authors on casting data flow analysis questions as model-checking problems 
is an example [35].) We intend to provide a practitioner with a basis she can use 
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x=0 

y=2 



x=l 

y=l 



x=2 

y=0 



Fig. 1. Example Kripke structure 



to translate problems into model structures and formulas that can be solved by 
model checking. 

The rest of this article is organized as follows: In the next section we dis- 
cuss the model structures underlying model-checking — Kripke structures, labeled 
transition systems and a structure combining both called Kripke transition sys- 
tems. Section 3 surveys the spectrum of the logics used for specifying properties 
to be automatically checked for model structures. We then introduce three basic 
approaches to model-checking: the semantic or iterative approach, the automata- 
theoretic approach, and the tableau method. The paper finishes with a number 
of concluding remarks. 



2 Models 

Model-checking typically depends on a discrete model of a system — the system’s 
behavior is (abstractly) represented by a graph structure, where the nodes rep- 
resent the system’s states and the arcs represent possible transitions between the 
states. It is common to abstract from the identity of the nodes. Graphs alone 
are too weak to provide an interesting description, so they are annotated with 
more specific information. Two approaches are in common use: Kripke struc- 
tures, where the nodes are annotated with so-called atomic propositions, and 
labeled transition systems (LTS), where the arcs are annotated with so-called 
actions. We study these two structures and introduce a third called Kripke tran- 
sition systems, which combines Kripke structures and labeled transition systems 
and which is often more convenient for modeling purposes. 

Kripke Structures. A Kripke structure (KS) over a set AP of atomic propositions 
is a triple {S, R, I), where S' is a set of states, R C S x S is a, transition relation, 
and I : S ^ 2^^ is an interpretation. Intuitively the atomic propositions, which 
formally are just symbols, represent basic local properties of system states; I 
assigns to each state the properties enjoyed by it. We assume that a set of 
atomic propositions AP always contains the propositions true and false and that, 
for any state s, true G I{s) and false ^ I{s). A Kripke structure is called total 
if i? is a total relation, i.e. if for all s G S' there is a f G S such that (s,t) G R 
otherwise it is called partial. For model-checking purposes S and AP are usually 
finite. 

Figure 1 displays an example Kripke structure whose propositions take the 
form, var = num; the structure represents the states that arise while the pro- 
gram’s components, x and y, trade two resources back and forth. 

Kripke structures were first devised as a model theory for modal logic [5,25], 
whose propositions use modalities that express necessity (“must”) and possibil- 
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ity (“may”). In this use of a Kripke structure, the states correspond to different 
“worlds” in which different basic facts (the atomic propositions) are true; tran- 
sitions represent reachability between worlds. The assertion that some fact is 
possibly true is interpreted to mean there is a reachable state (world) in which 
the fact holds; the assertion that a fact is necessarily true means that the fact 
holds in all reachable worlds. Kripke showed that the axioms and rules in dif- 
ferent systems of modal logics correspond to properties that hold in different 
classes of Kripke structures [28,29]. These logical settings for Kripke structures 
(in particular, the notion of “worlds” ) can provide useful guidance for expressing 
computing applications as Kripke structures [5,16]. 

Labeled Transition Systems. A labeled transition system (LTS) is a triple T = 
(S', Act, — >•), where S is a set of states, Act is a set of actions, and — >-C S x Act x S 
is a transition relation. A transition (s, a, s') G— >■, for which we adopt the more 
intuitive notation s s', states that the system can evolve from state s to 
state s' thereby exchanging action a with its environment. We call s — >■ s' a 
transition from s to s' labeled by action a, and s' is an a-successor of s. In an 
LTS, the transitions are labeled with single actions, while in a Kripke structure, 
states are labeled with sets of atomic propositions. Labeled transition systems 
originate from concurrency theory, where they are used as an operational model 
of process behavior [33]. In model-checking applications S and Act are usually 
finite. 

Fig. 3 displays two small examples of labeled transition systems that display 
the actions a vending machine might take. 

Kripke Transition Systems. Labels on arcs appear naturally when the labeling 
models the dynamics of a system, whereas labels on node appear naturally when 
the labeling models static properties of states. There are various ways to encode 
arc labelings by node labelings and vice versa. (One of them is described below.) 
And, logical considerations usually can be translated between these two repre- 
sentations. For these reasons, theoretical analyses study just one form of labeling. 
For modeling purposes, however, it is often natural to have both kinds of labeling 
available. Therefore, we introduce a third model structure that combines labeled 
transition systems and Kripke structures. 

A Kripke transition system (KTS) over a set AP of atomic propositions is a 
structure T = (S', Act, — >•, /), where S is a set of states, Act is a set of actions, 
-^C S X Act X S is a transition relation and / : S — >■ 2'^^ is an interpretation. 
For technical reasons we assume that AP and Act are disjoint. Kripke transition 
systems generalize both Kripke structures and labeled transition systems: A 
Kripke structure is a Kripke transition system with an empty set of actions. 
Act, and a labeled transition system is a Kripke transition system with a trivial 
interpretation, I. 

Kripke transition systems work well for modeling sequential imperative pro- 
grams for data flow analysis purposes, as they concisely express the implied 
predicate transformer scheme: nodes express the predicates or results of the 
considered analysis, and edges labeled with the statements express the nodes’ 
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Fig. 2. Two Kripke transition systems for a program 



interdependencies. If data flow analysis is performed via model-checking, Kripke 
transition systems thus enable to use the result of one analysis phase as input 
for the next one. 

Figure 2 shows two Kripke transition systems for the program, z ; =0 ; i : =0 ; 
while i ! =y do z:=z+x; i:=i+l end. Both systems label arcs with program 
phrases. The first system uses properties that are logical propositions of the 
form, var = expr; it portrays a predicate-transformer semantics. The second 
system uses propositions that are program variables; it portrays the results of a 
definitely- live- variable analysis. 

Any Kripke transition system T = (S', Act, — >■, /) over AP induces in a nat- 
ural way a Kripke structure Kt which codes the same information. The idea 
is to associate the information about the action exchanged in a transition with 
the reached state instead of the transition itself. This is similar to the classic 
translation of Mealy- Automata to Moore- Automata. Formally, Kt is the Kripke 

structure (S x Act, A, /') over AP U Act with R = {((s, a), (s', a')) | s A- s'} 
and I'{{s,a)) = /(s) U {a}. Logical consideration about T usually can straight- 
forwardly be translated to considerations about Kt and vice versa. Therefore, 
logicians usually prefer to work with the structurally more simple Kripke struc- 
tures. Nevertheless, the richer framework of Kripke transition systems is often 
more convenient for modeling purposes. 

Often we may want to designate a certain state Sq G S' in a KS, LTS, or KTS 
as the initial state. Intuitively, execution of the system starts in this state. A 
structure together with such a designated initial state is called a rooted structure. 



3 Logics 



The interpretation, I, in a Kripke transition system defines local properties of 
states. Often we are also interested in global properties connected to the transi- 
tional behavior. For example, we might be interested in reachability properties, 
like, “Can we reach from the initial state a state where the atomic proposition 
P holds?” Temporal logics [17,36] are logical formalisms designed for expressing 
such properties. 
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Fig. 3. Two vending machines 



Temporal logics come in two variants, linear-time and branching-time. Linear- 
time logics are concerned with properties of paths. A state in a transition system 
is said to satisfy a linear-time property if all paths emanating from this state 
satisfy the property. In a labeled transition system, for example, two states that 
generate the same language satisfy the same linear-time properties. Branching- 
time logics, on the other hand, describe properties that depend on the branching 
structure of the model. Two states that generate the same language but by us- 
ing different branching structures can often be distinguished by a branching-time 
formula. 

As an example, consider the two rooted, labeled transition systems in Fig. 3, 
which model two different vending machines offering tea and coffee. Both ma- 
chines serve coffee or tea after a coin has been inserted, but from the customer’s 
point of view the right machine is to be avoided, because it decides internally 
whether to serve coffee or tea. The left machine, in contrast, leaves this decision 
to the customer. Both machines have the same set of computations (maximal 
paths): {(coin, coffee), (coin, tea)}. Thus, a linear-time logic will be unable to 
distinguish the two machines. In a branching-time logic, however, the property, 
“a coffee action is possible after any coin action” can be expressed, which differ- 
entiates the two machines. 

The choice of using a linear-time or a branching-time logic depends on the 
properties to be analyzed. Due to their greater selectivity, branching-time logics 
are often better for analyzing reactive systems. Linear-time logics are preferred 
when only path properties are of interest, as when analyzing data-flow properties 
of graphs of imperative programs. 



3.1 Linear-Time Logics 

Propositional linear-time logic (PLTL) is the basic prototypical linear-time logic. 
It is often presented in a form to be interpreted over Kripke structures. Its 
formulas are constructed as follows, where p ranges over a set AP of atomic 
propositions: 



(/) ::= p I I ((ii V ())2 I X(())) I U((/), tp) I F(^) | G{(j)) 
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TT = 01 V 02 
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TT 1= • (0) 


iff 


|7t| > 1 and tt^ \= p 








TT 1= • (0, p) 


iff 
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TT 


, with TT*^ 
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TT 1= • (0) 


iff 


there is k, 0 < k < 


TT 


, with TT^ 


1= P 


tt\=‘ (p) 


iff 


for all k with 0 < k 


< 


kl, TT^ 1= 


p 



Fig. 4. Semantics of PLTL 



PLTL formulas are interpreted over paths in a Kripke structure K = (S, R, I) 
over AP. A finite path is a finite, non-empty sequence tt = (tto, . . . , 7t„_i) of 
states TTo, . . . ,7T„_i G S such that (tTi, TTi+i) G i? for all 0 < z < n — 1. n is 
called the length of path, denoted by |7 t|. An infinite path is an infinite sequence 
TT = (tto, TTi, 7T2 . . .) of states in S such that (TTj, TTi+i) G R for all z > 0. The 
length of an infinite path is oo. For 0 < z < |7 t|, denotes the i-th state in path 
TT, and TT* is (7rj,7rj -|- 1, . . .), the tail of the path starting at tt^. In particular, 
7T° = TT. A path in a Kripke structure is called maximal if it cannot be extended. 
In particular, every infinite path is maximal. 

In Fig. 4, we present an inductive definition of when a path, tt, in a Kripke 
structure K = {S,R,I) satisfies a PLTL formula, </>. Intuitively, tt satisfies an 
atomic proposition, p, if its first state does; atomic propositions represent local 
properties. -■ and V are interpreted in the obvious way; further Boolean connec- 
tives may be introduced as abbreviations in the usual way, e.g., (f>i A </>2 can be 
introduced as V 

The modality X(^) (“next fi”) requires the property (j) for the next situation 
in the path; formally, X((()) holds if <j) holds for the path obtained by removing 
the first state. G{(j>) (“generally </>” or “always requires </> to hold for all 
situations; F((()) (“finally (()”) for some (later) situation. Thus G and F provide a 
kind of universal (resp., existential) quantification over the later situations in a 
path. U((/), Tp) {“(p until ip”) requires ip to become true at some later situation and 
p to be true at all situations visited before. This operator sometimes is called 
“strong until” because it requires ip to become true finally. This is different for 
a variant of the until modality, called “weak until,” because the formula holds 
true when p is true forever. Strong- and weak-until can be defined from each 
other using F and G: 

(j{p, P) = \N\J{p, p) A f{p) and WU(</), p) = U{p, p) V G{p) . 

They are also (approximate) duals: 

-^U{p,p) = \NU{-'p,-'p A -Ip) and R\N\J{p,p) = U{-'p,-'p A -^p) . 
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Fig. 5. Illustration of linear-time modalities 



or 



Moreover, F can easily be defined in terms of U, and G in terms of WU 
F((/>) = U(true, 0) and G((/!)) = WU((/), false) , 
and F and G are duals: 

F(0) = -■G(“'(/)) and G(</>) = “>F(-'(/()) . 

The meaning of the modalities is illustrated in Fig. 5. 

While the basic structures of a linear-time logic are paths, the model-checking 
question usually is presented for a given Kripke structure. The question is then 
to determine, for each state, whether all paths emanating from the state satisfy 
the formula. Sometimes, one restricts the question to certain kinds of paths, 
e.g., just infinite paths, or maximal finite paths only, or all (finite or infinite) 
maximal paths. Perhaps the most common case is to consider infinite paths in 
total Kripke structures. 

Variants of PLTL may also be defined for speaking about the actions in a KTS 
or LTS. A (finite) path in a KTS or LTS is then a non-empty alternating sequence 
7T = (sq, ai. Si, . . . , Sn-i) of states and actions that begins and ends with a state 
and satisfies Si s^+i for f = 0, . . . , n — 1. Again we call |7 t| = n the length of 
path 7T, TTi stands for Sj, and tt* denotes the path (sj, a^+i, s^+i, . . . , s„_i). Infinite 
paths are defined similarly. With these conventions, PLTL can immediately be 
interpreted on such extended paths with the definition in Fig. 4. We may now 
also extend the syntax of PLTL by allowing formulas of the form (a), where a 
is an action in Act. These formulas are interpreted as follows: 

7T ^ (a) iff |7 t| > 1 and ai = a , 
where ai is the first action in tt. 

3.2 Branching-Time Logics 

Hennessy-Milner Logic. Hennessy-Milner logic (HML) is a simple modal 
logic introduced by Hennessy and Milner in [24,33]. As far as model-checking is 
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Fig. 6. Illustration of branching-time modalities 



concerned, Hennessy-Milner logic is limited because it can express properties of 
only bounded depth. Nevertheless, it is of interest because it forms the core of 
the modal /r-calculus, which appears in the next section. 

HML is defined over a given set, Act, of actions, ranged over by a. Formulas 
are constructed according to the grammar, 

(j) ::= true | false \ (j)i A 4>2 \ 4 >i <1^2 \ [o](j) \ 

The logic is interpreted over labeled transition systems. Given an LTS T = 
(S', Act, — >■), we define inductively when state s G S satisfies HML formula 4>: 



s \= true 




s [A false 




S\= (j)iA(j)2 


iff 


s\= 4>i and 


S\=(j)2 


s h <(’1 V </>2 


iff 


s 1= </>i or s 


1= 4>2 


s h W<(’ 


iff 


for all t with s A t, t \= (j) 


s h («)<(’ 


iff 


there is t with s A t and t \= (f> 



All states satisfy true and no state satisfies false. A state satisfies </>i A (f >2 if 
it satisfies both </>i and it satisfies (j)i V (j )2 if it satisfies either (j)i or 4>2 (or 
both) . The most interesting operators of HML are the branching time modalities 
[a] and (a). They relate a state to its a-successors. While [a](j) holds for a state 
if all its o-successors satisfy formula <j), (a) holds if an a-successor satisfying 
formula (j) exists. This is illustrated in Fig. 6. The two modalities provide a kind 
of universal and existential quantification over the a-successors of a state. 

As introduced above, HML has as its atomic propositions only true and false. 
If HML is interpreted on a KTS T = (S', Act, — >■, /) over a certain set of atomic 
propositions, AP, we may add atomic formulas p for each p G AP. These formulas 
are interpreted as follows: 

s \= p iff p G I{s) 

Moreover, it is sometimes useful in practice to use modalities [A] and (A) 
that range over set of actions A C Act instead of single actions. They can be 
introduced as derived operators: 

[A]^ "Af /\ [a]^ (A)0 ‘'Af Y (a)<^ . 

a^A a^A 

We also write [ ] for [Act] and ( ) for (Act) . A version of HML suitable for Kripke 
structures would provide just the modalities [ ] and ( ) . 
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Modal /x-Calculus. The modal mu-calculus [27] is a small, yet expressive 
branching-time temporal logic that extends Hennessy-Milner logic by fixpoint 
operators. Again it is defined over a set, Act, of actions. We also assume a given 
infinite set Var of variables. Modal /t-calculus formulas are constructed according 
to the following grammar: 

(j) ::= true | false | [a](j) \ {a)4> | A ()()2 | </)i V </>2 | A | fj,X . <j) \ vX . (f) 

Here, X ranges over Var and a over Act. The two fixpoint operators, fxX and vX, 
bind free occurrences of variable X . We will apply the usual terminology of free 
and hound variables in a formula, closed and open formulas, etc. Given a least 
(resp., greatest) fixpoint formula, p,X .f {vX .f), we say that the p, {v) is the 
formula’s parity. 

The above grammar does not permit negations in formulas. Of course, nega- 
tion is convenient for specification purposes, but negation-free formulas, known 
as formulas in positive form, are more easily handled by model-checkers. In most 
logics, formulas with negation can easily be transformed into equivalent formulas 
in positive form by driving negations inwards to the atomic propositions with 
duality laws like 

= [a]“'(/' , A ((> 2 ) = ~'4'i V , and ~'{pX . cf) = i>X . —‘(j)\—'X/X'\ . 

But there is a small complication: We might end up with a subformula of the 
form -<X from which the negation cannot be eliminated. We avoid this problem 
if we pose this restriction on fixpoint formulas: in every fixpoint formula, pX . <j) 
or vX . (j), every free occurrence of X in f must appear under an even number 
of negations. This condition ensures also that the meaning of f depends mono- 
tonically on X, which is important for the well-definedness of the semantics of 
fixpoint formulas. 

Modal mu-calculus formulas are interpreted over labeled transition systems. 
Given an LTS T = (S', Act, — >•), we interpret a closed formula, 4>, as that subset 
of S whose states make </> true. To explain the meaning of open formulas, we 
employ environments, partial mappings p : Var 2^ , which interpret the free 

variables of f by subsets of S; p{X) represents an assumption about the set of 
states satisfying the formula X. The inductive definition of Airiffip), the set 
of states of T satisfying the mu-calculus formula f w.r.t. environment p, is given 
in Fig. 7. The meaning of a closed formula does not depend on the environment. 
We write, for a closed formula </> and a state s £ S, s (() if s € Mriffip) for 
one (and therefore for all) environments. 

Intuitively, true and false hold for all, resp., no states, and A and V are 
interpreted by conjunction and disjunction. As in HML, {a)(j) holds for a state 
s if there is an a-successor of s which satisfies f, and [a](f> holds for s if all its 
a-successors, satisfy 4>. The interpretation of a variable A is as prescribed by the 
environment. The least fixpoint formula, pX . <f, is interpreted by the smallest 
subset cc of S' that recurs when f is interpreted with the substitution of x for X. 
Similarly, the greatest fixpoint formula, vX . f, is interpreted by the largest such 
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Mt(‘““)(p) 

Xt(H0)(p) 

MT{{a)(j>){p) 
MT{4>^ A cf)2){p) 
Mriff)! V cf)2)ip) 

Mt(X)(p) 
Mt(pX.<P)(p) 
Mt{vX .(j>){p) 



S 

0 

{s I Vs' : s A s' => s' e MT{<j>){p)} 
{s I 3s' : s A s' A s' € MT{<t>){p)} 
Mt{4>i){p) n Mt{4'^){p) 

Mt{4>i){p) U Mt{4'^){p) 

P(X) 

^^iyFtp:P 



Fig. 7. Semantics of modal mu-calculus 



set. These sets can be characterized as the least and greatest fixpoints, 
and fixi^F^^p, of the functional : 2'^ — >• 2‘® defined by 

F 4 >,p(x) = MTi4>){p[X H> a;]) 

Here, p[X i— >■ a;] denotes, for a set x C S and a variable X G Var, the environment 
that maps X to x and that coincides on the other variables with p. We now must 
review the basic theory of fixpoints. 



3.3 Fixpoints in Complete Lattices 

A convenient structure accommodating fixpoint construction is the complete 
lattice, i.e., a non-empty, partially ordered set in which arbitrary meets and joins 
exist. We assume the reader is familiar with the basic facts of complete lattices 
(for a thorough introduction see the classic books of Birkhoff and Gratzer [3,21]). 

We now recall some definitions and results directly related to fixpoint theory. 
Let (A, <a) and (B,<b) be complete lattices and C a subset of A. C is a chain 
if it is non-empty and any two elements of C are comparable with respect to 
<^. A mapping / G (A — >■ H) is monotonic if a <a a,' implies f(a) < f{a') 
for all a, a' G A. The mapping is V-continuous if it distributes over chains, i.e., 
for all chains CCA, /(VC) = V{/(c) | c G C}. The notion of A-continuity is 
defined dually. Both V- and A-continuity of a function imply monotonicity. T 
and T denote the smallest and largest elements of a complete lattice. Finally, a 
point a G A is called a fixpoint of a function / G (A — >• A) if /(a) = a. It is a 
pre-fixpoint of / if /(a) < a and a post-fixpoint if a < /(a). 

Suppose that / : A — >■ A is a monotonic mapping on a complete lattice 
(A, <). The central result of fixpoint theory is the following [40,31]: 

Theorem 1 (Knaster- Tarski fixpoint theorem). If f : A ^ A is a mono- 
tonic mapping on a complete lattice (A, <), then f has a least fixpoint fix^/ 
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as well as a greatest fixpoint fix^/ which can he characterized as the smallest 
pre-fixpoint and largest post-fixpoint respectively: 

= f\{a I /(a) < a} and fix^/ = | a < /(a)} . 

For continuous functions there is a “constructive” characterization that con- 
structs the least and greatest fixpoints by iterated application of the function to 
the smallest (greatest) element of the lattice [26]. The iterated application of / 
is inductively defined by the two equations /°(a) = a and f^^^{a) = f{P{a)). 

Theorem 2 (Kleene fixpoint theorem). For complete lattice (A, <), if f : 
A is y -continuous, then its least fixpoint is the join of this chain: 

Dually, if f is A-continuous, its greatest fixpoint is the meet of this chain: 

fix./ = f\{r{T) I * > 0} . 

In the above characterization we have /°(-L) < /^(-L) < /^(-L) < ••• and, 
dually, /*^(T) > /^(T) > /^(T) > •••. As any monotonic function on a finite 
lattice is both V-continuous and A-continuous, this lets us effectively calculate 
least and greatest fixpoints for arbitrary monotonic functions on finite com- 
plete lattices: The least fixpoint is found with the smallest value of i such that 
/*(_L) = /*+^(T); the greatest fixed point is calculated similarly. This observa- 
tion underlies the semantic approach to model-checking described in Sect. 4.3. 

The following variant of Kleene’s fixpoint theorem shows that the iteration 
can be started with any value below the least or above the greatest fixpoint; 
it is not necessary to take the extremal values T and T. This observation can 
be exploited to speed up the fixpoint calculation if a safe approximation is al- 
ready known. In particular, it can be used when model-checking nested fixpoint 
formulas of the same parity (see Sect. 4.3). 

Theorem 3 (Variant of Kleene’s fixpoint theorem). Suppose that f : A^ 
A is a y -continuous function on complete lattice {A, <), and a € A. If a < fix^/, 
then fix^/ = V{/*(a) I * > 0} • 

Dually, if f is A-continuous and fixi,/ < a, then fix,^/ = A{/*(®) 1*^0} • 

In the context of the modal mu-calculus, the fixpoint theorems are applied 
to the complete lattice, (2'®,C), of the subsets of S ordered by set-inclusion. 
With the expected ordering on environments (p p' iff domp = domp' and 
d(V) C p'{X) for all X G domp), we can prove that must be monotonic. 
Thus, the Knaster-Tarski fixpoint theorem ensures the existence of fix^F^^p and 
fixi^F^^p and gives us the following two equations that are often used for defining 
semantics of fixpoint formulas: 

Mt{p-X . 4>){p) = f|{a: C S \ AIt(0)(p[A h> a;]) C x} 

Mt{vX . 4)){p) = (J{a; C S \ Mt{4>){p[X a;]) D a;} 
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For finite-state transition systems, we have the Kleene fixpoint theorem: 

7WT(A^X.<^)(p) = U{n.pW l*>0} 

MT{yX.c!>){p) = (\{Fl^{S)\i>Q} 

These characterizations are central for the semantic approach to model-checking 
described in Sect. 4.3. 

It is simple to extend the modal mu-calculus to work on Kripke transition 
systems instead of labeled transition systems: We allow the underlying atomic 
propositions p G AP as atomic formulas p. The semantic clause for these formulas 
looks as follows: 

Mt{p){p) = {s € S \ p € I{s)} . 

If we replace the modalities [a] and (a) by [ ] and ( ) we obtain a version of the 
modal mu-calculus that fits to pure Kripke structures as model structures. 



Computational Tree Logic. Computational Tree Logic (CTL) was the first 
temporal logic for which an efficient model-checking procedure was proposed 
[11]. Its syntax looks as follows: 

</) ::= p I (/.i V 02 I AU(0, 0) | EU(0, 0>) | AF(0) | EF(0) | AG(0) | EG(0) 

CTL has the six modalities AD, ED, AF, EF, AG, AF. Each takes the form QL, 
where Q is one of the path quantifiers A and E, and L is one of the linear-time 
modalities U, F, and G. The path quantifier provides a universal (A) or existential 
(E) quantification over the paths emanating from a state, and on these paths the 
corresponding linear-time property must hold. For example, the formula EF(0) 
is true for a state, s, if there is a path, tt, starting in s on which 0 becomes 
true at some later situation; i.e., the path tt has to satisfy tt ^ F(0) in the sense 
of PLTL. In contrast, AF(0) holds if on all paths starting in s 0 becomes true 
finally. 

The meaning of the CTL modalities can be expressed by means of fixpoint 
formulas. In this sense, CTL provides useful abbreviations for frequently used 
formulas of the modal p-calculus. Here are the fixpoint definitions of the U 
modalities: 



AU(0, 0) = pX . (0 V (0 A [ ]Ai A ( )true)) 
EU(0,0) ‘'=1' pX.(0V(0A()X)) . 



The F modalities can easily be expressed by the U modalities 

AF(0) =*' AU(true,0) EF(0) =*' EU(true,0) . 

and the G modalities are easily defined as the duals of the F modalities: 



def 



EG(0) 



def 



AG(0) 



EF(-0) 



'AG (“10) . 
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By unfolding this definitions, direct fixpoint characterizations of the F and G 
modalities can easily be obtained. 

The above described version of CTL operates on pure Kripke structures. 
For LTSes and KTSes it is less useful, as it does not specifying anything about 
the labels on arcs. We might extend CTL’s modalities by relativizing them with 
respect to sets of actions A C Act — the path quantifiers consider only those paths 
whose actions come from A; all other paths are disregarded. In the following 
system, e.g.. 




c 

the state s satisfies AF{„ f,}(P), as the path (s,a,t,b,u) is taken into account. 
But s does not satisfy AF{o^c}(-P) as here only the path (s,a,t) is considered. 
Again these modalities can be defined by fixpoint formulas, for example: 

AGA{(l)) = iyX.(j)A[A]X and Ef . <p V {A)Y . 

4 Model Checking 

4.1 Global vs. Local Model-Checking 

There are two ways in which the model-checking problem can be specified: 

Global model-checking problem: Given a finite model structure, M, 
and a formula, </>, determine the set of states in M that satisfy </>. 

Local model-checking problem: Given a finite model structure, M, 
a formula, (j), and a state s in M, determine whether s satisfies 4>. 

While the local model-checking problem must determine modelhood of a 
single state the global problem must decide modelhood for all the states in the 
structure. Obviously, solution of the global model-checking problem comprises 
solution of the local problem, and solving the local model-checking problem 
for each state in the structure solves the global model-checking problem. Thus, 
the two problems are closely related, but global and local model-checkers have 
different applications. 

For example, a classic application of model-checking is the verification of 
properties of models of hardware systems, where the hardware system contains 
many parallel components whose interaction is modeled by interleaving. The 
system’s model structure grows exponentially with the number of parallel com- 
ponents, a problem known as the state- explosion problem. (A similar problem 
arises when software systems, with variables ranging over a finite domain, are 
analyzed — the state space grows exponentially with the number of variables.) 

In such an application, local model-checking is usually preferred, because the 
property of interest is often expressed with respect to a specific initial state — a 
local model checker might inspect only a small part of the structure to decide 
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Branching-time 


Linear-time 


Global 


Local 


Semantic methods 


X 




X 




Automata-theoretic methods 
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X 


Tableau methods 


X 


X 




X 



Fig. 8. Classification of model-checking approaches 

the problem, and the part of the structure that is not inspected need not even 
be constructed. Thus, local model-checking is one means for fighting the state- 
explosion problem. 

For other applications, like the use of model-checking for data-flow analysis, 
one is really interested in solving the global question, as the very purpose of the 
model-checking activity is to gain knowledge about all the states of a structure. 
(For example, in Figure 2, the structure is the flow graph of the program to be 
analyzed, and the property specified by a formula might be whether a certain 
variable is “definitely live.”) Such applications use structures that are rather 
small in comparison to those arising in verification activities, and the state- 
explosion problem holds less importance. Global methods are preferred in such 
situations. 



4.2 Model-Checking Approaches 

Model checking can be implemented by several different approaches; prominent 
examples are the semantic approach, the automata-theoretic approach, and the 
tableau approach. 

The idea behind the semantic (or iterative) approach is to inductively com- 
pute the semantics of the formula in question on the given finite model. This 
generates a global model-checker and works well for branching-time logics like 
the modal mu-calculus and CTL. Modalities are reduced to their fixpoint defi- 
nitions, and fixpoints are computed by applying the Kleene fixpoint theorem to 
the finite domain of the state powerset. 

The automata-theoretic approach is mainly used for linear-time logics; it re- 
duces the model-checking problem to an inclusion problem between automata. 
An automaton, A^, is constructed from formula, (j); A^j, accepts the paths satisfy- 
ing (j). Another automaton. Am, is constructed from the model, M, and accepts 
the paths exhibited by the model. M satisfies (p iff L{Am) Q L{A(f,). This prob- 
lem can in turn be reduced to the problem of deciding non-emptiness of a product 
automaton which is possible by a reachability analysis. 

The tableau method solves the local model-checking problem by subgoaling. 
Essentially, one tries to construct a proof tree that witnesses that the given state 
has the given property. If no proof tree can be found, this provides a disproof of 
the property for the given state. Since the tableau method intends to inspect only 
a small fraction of the state space, it combines well with incremental construction 
of the state space, which is a prominent approach for fighting state explosion. 

Figure 8 presents typical profiles of the three approaches along the axes of 
branching- vs. linear-time and global vs. local model-checking. The classification 
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is, of course, to be understood cum grano salis. Applications of the methods 
for other scenarios are also possible but less common. In the remainder of this 
section, we describe each of these approaches in more detail. 



4.3 Semantic Approach 

Based on the iterative characterization of fixpoints, the semantics of modal mu- 
calculus formulas can be effectively evaluated on finite-state Kripke transition 
systems. But in general, this is quite difficult, due to the potential interference 
between least and greatest fixpoints. As we see later in this section, the alternated 
nesting of least and greatest fixpoints forces us to introduce backtracking into 
the fixpoint iteration procedure, which causes an exponential worst-case time 
complexity of iterative model checking for the full mu-calculus. Whether this 
is a tight bound for the model-checking problem as such is a challenging open 
problem. 

Before we explain the subtleties of alternation, we illustrate iterative model 
checking for formulas whose fixpoint subformulas contain no free variables. In 
this case, the iteration process can be organized in a hierarchical fashion, giving 
a decision procedure whose worst-case time complexity is proportional to the 
size of the underlying finite-state system and the size of the formula: 



1. Associate all variables belonging to greatest fixpoints to the full set of states 
S and all variables belonging to least fixpoints with 0. 

2. Choose a subformula, (p = iiX.p' (or vX.(j>'), where p' is fixpoint free, de- 
termine its semantics, and replace it by an atomic proposition A^, whose 
valuation is defined by its semantics. 

3. Repeat the second step until the whole formula is processed. 



We illustrate this hierarchical procedure for p AF{h}(EF{f,j.((a)true) and 
the following transition system T : 

b 



b b 0 

t U 



Intuitively (p describes the property that from all states reachable via 6-transition, 
an a-transition is finitely reachable along a path of 6-transitions; u and v enjoy 
this property while s and t do not. 

Unfolding the CTL-like operators in p using the corresponding fixpoint def- 
initions, we obtain 



p = vX . {fiY . (a)true V {b)Y) A \b]X . 

A hierarchical model-checker first evaluates {^Y . (a)true V {b)Y), the inner fix- 
point formula: Letting py denote the formula (a)true V {b)Y), we have, for any 
environment p, 

Mt{P){p) = fix^A^y^p . 
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Now, by the Kleene fixpoint theorem this fixpoint formula can be calculated 
by iterated application of to the smallest element in 2‘®, 0. Here are the 

resulting approximations: 

= 0 

Fly, pm = M{{a)true){p[Y ^ %]) ^ M{{b)Y){p[Y ^ 0]) 

= {m} U 0 = {u} 

Fly, pm = F^y,p{{u}) 

= Af ((a)true)(p[y i->- {u}]) iJ M{{h)Y){p\Y i->- {m}]) 

= {m} U {t, u, u} = {t, u, u} 

Fly, pm = F4,^,p{{t,u,v}) 

= A4((a)true)(p[y {t,u,v}]) \J M{{b)Y){p\Y i->- {t, m, u}]) 

= {m} U {t, u, u} = {t, u, u} . 

Thus, {t,u,v} is the meaning of p,X .(f) x in any environment. Next, the hierar- 
chical model-checker evaluates the formula </>' vX .py A [b]X, where py is a 
new atomic proposition that holds true for the states t, u, and v. Again, this is 
done by iteration that starts this time with S = {s, t, u, u}, as we are confronted 
with a greatest fixpoint. Let (px denote the formula py A [b]X. The iteration’s 
results look as follows: 

Fl.,p{S) = S 

Fl^,p{S) = M{py){p[X ^ S])f^M{[h]X){p[X ^ A]) 

= {t, w, u} n A = {t, u, u} 

Fly,p{S) = F^y,p{{t,u,v}) 

= M{py){p[X H> {t,u,v}]) r\ M{[b]X){p[X H> {t,u,u}]) 

= {t, U, v} n {s, M, v} = {w, u} 

FIyA^^ = F<i>Y,p{{u,v}) 

= M{py){p[X H> {u, u}]) M{\b]X){p[X H> {u, u}]) 

= {t, U, r:} n {s, M, v} = {m, u} . 

The model-checking confirms our expectation that just states u and v have 
property </>. 

In the above example, the inner fixpoint formula, {p.Y . (a)true V {h)Y), does 
not use the variable introduced by the outer fixpoint operator, X. Therefore, 
its value does not depend on the environment, p. This is always the case for 
CTL-like formulas and enables the hierarchical approach to work correctly. If, 
however, the inner fixpoint depends on the variable introduced further outwards, 
we must — at least in principle — evaluate the inner fixpoint formula again and 
again in each iteration of the outer formula. Fortunately, if the fixpoint formulas 
have the same parity, i.e., they are either both least fixpoint formulas or both 
greatest fixpoint formulas, we can avoid the problem and correctly compute the 
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values of the inner and outer formulas simultaneously, because the value of a 
fixpoint formula depends monotonically on the value of its free variables and the 
iterations of both formulas proceed in the same direction. 

In the case of mutual dependencies between least and greatest fixpoints, 
however, the iterations proceed in opposite directions, which excludes a simple 
monotonic iteration process. Such formulas are called alternating fixpoint formu- 
las. They require backtracking (or resets) in the iteration process. The following 
is a minimal illustrative example. 

Example 1. Consider the formula ip uZ . pY . {{h)Z\/ (a)Y), which intuitively 
specifies that there is a path consisting of a- and 6-steps with infinitely many 
6-steps. We would like to check it for the following LTS: 

s — - — t a 

Here are the results of the iterations for the outer fixpoint variable Z with 
the nested iterations for Y : 



Iteration for Z 


I 


2 


3 


Assumption for Z 
Iterations for Y 


{s,t} 

0 {t} {0 


{t} 

0 0 


0 

0 0 



Thus, we correctly calculate that neither s nor t satisfies ip. 

If, however we do not reset the iterations of F to 0 in each iteration of Z but 
simply start F’s iterations with the old approximations, we produce the wrong 
result {t} that “stabilizes” itself: 



Iteration for Z 


I 


2 


Assumption for Z 
Iterations for Y 


{s,t} 

0 {t} {0 


{n {t} 



Due to the nested iteration, the complexity of model-checking formulas with 
alternation is high. A careful implementation of the iterative approach leads to 
an asymptotic worst-case running time of 0((|T| • [13], where |T| is the 

number of states and transitions in the model structure (LTS, KS or KTS) and 
\(p\ is the size of the formula (measured, say, by the number of operators), ad 
refers to the alternation depth of the formula, which essentially is the number 
of non-trivial alternations between least and greatest fixpoints. (The alternation 
depth of an alternation-free formula is taken to be 1.) While the precise definition 
of alternation depth is not uniform in the literature, all definitions share the 
intuitive idea and the above stated model-checking complexity. By exploiting 
monotonicity, a better time-complexity can be achieved but at the cost of a 
large storage requirement [32] . It is a challenging open problem to determine the 
precise complexity of y:i-calculus model-checking; it might well turn out to be 
polynomial (it is known to be in the intersection of the classes NP and co-NP) ! 
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Fig. 9. An automaton corresponding to 0 = • (• (P, Q), R) 

Alternation-free formulas can be checked efficiently in time 0(|r| • |0|). This 
holds in particular for all CTL-like formulas as they unfold to alternation-free 
fixpoint formulas. 

4.4 Automata-Theoretic Approach 

For easy accessibility, we illustrate the automat a-theoretic approach for checking 
PLTL formulas on maximal finite paths in Kripke structures: Given a Kripke 
structure (S', R, I), a state s in S, and a PLTL formula 4>, we say that sq satisfies 
(j) if any maximal finite path tt = (so> Si, • ■ • j s„) satisfies </>. 

A path TT as above can be identified with the finite word Wt^ = {I{sq), I{si), 
..., I{sn)) over the alphabet 2'^^. Note that the letters of the alphabet are 
subsets of the set of atomic propositions. In a straightforward way, validity of 
PLTL formulas can also be defined for such words. Now, a PLTL formula, 
induces a language of words over 2'^^ containing just those words that satisfy fi. 
For any PLTL formula, the resulting language is regular, and there are systematic 
methods for constructing a finite automaton, A^, that accepts just this language. 
(This automaton can also be used to check satisfiability of we merely check 
whether the language accepted by the automaton is non-empty, which amounts 
to checking whether a final state of the automaton is reachable from the initial 
state.) In general, the size of grows exponentially with the size of </> but is 
often small in practice. 

Example 2. Consider the formula = U(U(P, Q), i?). The corresponding au- 
tomaton A(f, is shown in Fig. 9. We adopt the convention that an arrow marked 
with a lower case letter represents transitions for all sets containing the proposi- 
tion denoted by the corresponding upper case letter. An arrow marked withp, for 
example, represents transitions for the sets {P},{P,Q},{P,R} and, {P,Q,R}. 
Similarly a conjunction of lower case letters represents transitions containing 
both corresponding upper case propositions. For example, an arrow marked with 
p t\q represents transitions marked with {P,Q} and {P, Q, i?}. 

It is easy to construct from the Kripke structure, K, a finite automaton, Ak, 
accepting just the words corresponding to the maximal finite paths starting 
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{Qj {Q},{P.Q} {Q} {Q},{P.Q} 




Fig. 10. A formula and the corresponding automata 





Fig. 11. Two Kripke structures, the corresponding automata, and their products with 
the formula automaton from Fig. 10. Mo del- checking succeeds for the left Kripke struc- 
ture and fails for the right one. 



in s: It is given by Ak = (S' U {sf }, 2''^^, 5, s, {sf}), where Sf is a new state and is 
the only final state in the automaton, and 

(5 = {{s,I{s),t) I (s,t) G S} U {(s,/(s),sf) I s is final in K} 

Here, a state s is called final in K if it has no successor, i.e., there is no t with 
(s,t) G R. 

To answer the model-checking question amounts to checking whether L{Ak) C 
L{Afi). This is equivalent to L{Ak)C\L{A^Y = The latter property can effec- 
tively be checked: Finite automata can effectively be complemented, the product 
automaton of two automata describes the intersection of the languages of the two 
automata, and the resulting automaton can effectively be checked for emptiness. 

def 

Example 3. Let us illustrate this approach for the formula = F(P) A G{Q) 
over AP = {P,Q}. Figure 10 shows the automata generated from fi. The first 
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automaton in the figure is A^, the automaton that accepts the language over 
corresponding to (j). Complementation of a finite automaton is partic- 
ularly simple if the automaton is deterministic and fully defined, because the 
automaton has for any state exactly one transition for any input symbol. An au- 
tomaton can be made deterministic by the well-known power-set construction; 
full definedness can be obtained by adding a (non-final) state that “catches” 
all undefined transitions. The automaton shown in Fig. 10 has been made 
deterministic and is fully defined; to keep it small, it has also been minimized. 
Such transformations preserve the language accepted by the automaton and are 
admissible and commonly applied. The second automaton in Fig. 10, to which 
we refer by in the following, accepts the complement of the language corre- 
sponding to (j). As Afj) has been made deterministic and is fully defined, it can 
easily be complemented by exchanging final and non-final states. 

In Fig. 11, we show two rooted Kripke structures K, the corresponding Au- 
tomata Ak, and the product automata A^ x Ak-^ In order to allow easy com- 
parison, the states in A^ x Ak have been named by pairs ij; i indicates the 
corresponding state in A^ and j the corresponding state in Ak- 

It is intuitively clear that the left Kripke structure satisfies (j>. An automata- 
theoretic model-checker would analyze whether the language of A^ x Ak is 
empty. Here, it would check whether a final state is reachable from the initial 
state. It is easy to see that no final state is reachable; this means that K indeed 
satisfies </>. 

Now, consider the rooted right Kripke structure in Fig. 11. As its final state 
does not satisfy the atomic proposition, Q, the Kripke structure does not satisfy 
(j ) — a final state is reachable from the initial state in the product automaton. 

A similar approach to model-checking can be used in the more common case 
of checking satisfiability of infinite paths [42]. In this case, automata accepting 
languages of infinite words, like Biichi or Muller automata [41], are used instead 
of automata on finite words. Generation of the automata A^ and Ak as well 
as the automata-theoretic constructions (product construction, non-emptiness 
check) are more involved, but nevertheless, the basic approach remains the same. 
PLTL model-checking is in general PSPACE-complete [17] ; the exponential blow- 
up in the construction of A^ is unavoidable. 

The main applications of the automata-theoretic approach are linear-time 
logics as languages consisting of words. The approach can also be applied, how- 
ever, to branching-time logics by using various forms of tree automata. 



4.5 Tableau Approach 

The tableau approach addresses the local-model checking problem: For a model, 
A4, and property, (j), we wish to learn whether s (p holds for just the one 
state, s — global information is unneeded. We might attack the problem by a 

^ Strictly speaking, only that part of the state space is shown that is reachable from 
the initial state, and not the full product automaton. 
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search of the state space accessible from s, driving the search by decomposing 
4>. We write our query as s \~a 4> (taking the Ai as implicit) and use subgoaling 
rules, like the following, to generate a proof search, a tableau. For the moment, 
the rules operate on just the Hennessy-Milner calculus; the subscript, Z\, will be 
explained momentarily: 



S \~A 4>1 4’2 

S S \~A 4>2 



S F/i (j)i V 4>2 

S \~A 4>1 



S \~A (j)! V <p2 
S \~A ^2 



s [a]<p 

— ■ 

Si \~A <p - ■ ■ Sn^A (p 



Sn} = {s' I S 4 s'} 



s F^ {a)(j) 
s' F^ 4> 



if s ■ 



A tableau for a Hennessy-Milner formula must be finite. We say that the tableau 
succeeds if all its leaves are successful, that is, they have form (i) s \~a true, or 
(ii) s \~A [o]4> (which implies there are no a-transitions from s). 

It is easy to prove that a succesful tableau for s F^ (/> implies s \=^ (j)] con- 
versely, if there exists no successful tableau, we correctly conclude that S/(='^(j). 
(Of course, a proof search that implements and/or subgoaling builds just one 
“meta-tableau” to decide the query.) 

When formulas in the modal-mu calculus are analyzed by tableau, there is 
the danger of infinite search. But for finite-state models, we can limit search due 
to the semantics of the fixed-point operators: Say that s \~a subgoals to 

s \~A X. We conclude that the latter subgoal is unsuccessful, because 



s 1='^ piX.(j)x iff s 1=^ \J Xi, where 

i>0 



Xq = false 
Xi+i = 4>Xi 



That is, the path in the tableau from s \~a pi^X.cfx to s \~a X can be unfolded 
an arbitrary number of times, generating the Xi formulas, all of which subgoal 
to Aq, which fails. 

Dually, a path from s \~a vX.ifx to s F/i A succeeds, because 



s 1='^ vX.(j)x iff s 1=^ Xi, where 

i>0 



Ao = true 
Ai+i = 4>Xi 



As suggested by Stirling and Walker [37], we analyze the fixed-point operators 
with unfolding rules, and we terminate a search path when the same state, fixed- 
point formula pair repeat. Of course, we must preserve the scopes of nested 
fixed points, so each time a fixed-point formula is encountered in the search, 
we introduce a unique label, U, to denote it. The labels and the formulas they 
denote are saved in an environment, A. 

Here are the rules for fx; the rules for ix work in the same way: 



S \"A tiX.cfx 
S \~A' Id 



where A' = A+^ ^ ^X.(j>x] and U is fresh for A 



S \~A Id 



S \~A (fu 



where AiU) = piX.cfx- Important: See note below. 
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Transition system: 



Tableau for s h® © : 




b 



Let : 

<j) = vX.cfi'x A [b]X s 

Z\i = [L/i I— >• 0] 

^2 — -^1 + [ L/2 



S 1-0 0 
S \~Ai Ul 

s l-Ai <^Mi A \b]Ul 

s \~Ai 4>'ui •* [b]U\ 

s \~ Aq, U 2 s hzii L/i 

'a)‘“ • V {b)U2 
a I-A 2 (a)**** 
t I-A 2 ••• * 



Fig. 12. Transition system and model check by tableau 



Note: The second rule can be applied only when s h^' U has not already ap- 
peared as an ancestor goal. This is how proof search is terminated. 

A leaf of form, s \-a U, is successful iff A{U) = vX.tpx- Figure 12 shows a 
small transition system and proof by tableau that its state, s, has the property, 
vX.{^Y.{a)true\J {b)Y) A [6]A, that is, an a-transition is always finitely reachable 
along a path of 6-transitions. The tableau uses no property of state, t. 

The tableau technique is pleasant because it is elegant, immune to the trou- 
bles of alternating fixpoints, and applicable to both branching-time and linear- 
time logics. 

5 Conclusion 

One of the major problems in the application of model checking techniques to 
practical verification problems is the so-called state explosion problem: models 
typically grow exponentially in the number of parallel components or data ele- 
ments of an argument system. This observation has led to a number of techniques 
for tackling this problem [39,14]. 

Most rigorous are compositional methods [2,10,23], which try to avoid the 
state explosion problem in a divide an conquer fashion. Partial order meth- 
ods limit the size of the model representation by suppressing unnecessary inter- 
leavings, which typically arise as a result of the serialization during the model 
construction of concurrent systems [19,43,20]. Binary Decision Diagram-hdiSed 
codings, todays industrially most successful technique, allow a polynomial sys- 
tem representation, but may explode in the course of the model checking process 
[4,6,18]. All these techniques have their own very specific profiles. Exploring these 
profiles is one of the current major research topics. 

Another fundamental issue is abstraction: depending on the particular prop- 
erty under investigation, systems may be dramatically reduced by suppressing 




352 



M. Miiller-Olm, D. Schmidt, and B. Steffen 



details that are irrelevant for verification, see, e.g., [15,9,22,30]. Abstraction is 
particularly effective when it is combined with the other techniques. 

In this article we have focused on finite model structures, but recent research 
shows that effective model-checking is possible also for certain classes of finitely 
presented infinite structures. Work in this direction falls in two categories: First, 
continuous variables have been added to finite structures. This work was moti- 
vated by considerations on verified design of embedded controllers. Timed sys- 
tems have found much attention (Alur and Dill’s work on timed automata [1] 
is a prominent example) but also more general classes of hybrid systems have 
been considered. A study of this work could start with [38] where besides a gen- 
eral introduction three implemented systems, HyTech, Kronos, and Uppaal, are 
described. Second, certain classes of discrete infinite systems have been studied 
that are generated by various types of grammars in various ways. The interested 
reader is pointed to the surveys [8,7] that also contain numerous references. 
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Abstract. Since the introduction of data flow analysis more than 20 years ago, the 
applications of data flow analysis have expanded considerably with the recognition 
of its practical benefits. The current use of data flow analysis goes well beyond its 
initial application of register allocation and machine independent optimizations. 
Compilers today rely heavily on data flow analyses for sophisticated optimiza- 
tions and to guide the exploitation of architectural features, such as the numbers 
of processors and their functional units and the memory hierarchy. Besides com- 
pilers, data flow analysis is also used in software engineering. Applications include 
program verification, debugging (especially of optimized and parallelized code), 
program test case generation and coverage analysis, regression testing, program 
integration and program understanding. 

The expanded use of data flow analysis has created a demand for a number of 
extensions and improvements. Advances in data flow analysis have particularly 
occurred to improve its scalability and precision. Techniques that produce more 
informative results about the run-time behavior and environment have also been 
developed by integrating dynamic and architectural information into the analysis 
and its results. Data flow analysis has been extended to model different program- 
ming languages and features, including the object-oriented paradigm and parallel 
threads. 

This tutorial will first present a broad overview of the recent advances in data flow 
analysis and then focus on techniques that improve the scalability and precision 
of the analysis. 

Concern about the scalability of data flow analysis, both in terms of execution 
time and memory, is due to the need for whole program analysis, the use of mul- 
tiple analyses, and the requirements of applications in a production environment. 
Techniques to improve the performance of analyses have focused on both the pro- 
gram representation used by the analysis and the analysis itself A number of graph 
representations have been developed that permit direct connections between the 
generation of data flow information and the use of that information. Other repre- 
sentations have been proposed to enable more efficient interprocedural analysis by 
producing summary information about procedures. To improve the scalability of 
analysis, techniques have targeted reducing the number of program points that are 
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modeled and reducing the number of quantities that are modeled simultaneously. 
Demand driven analysis and partitioning are the major approaches to improve 
the execution time performance and memory demands of analysis. Performance 
improvements have also been addressed for the recomputation of data flow by in- 
cremental updating of data flow information after changes are made to the program. 

Path-based approaches have been developed to improve the precision of data 
flow analysis. The precision of interprocedural analysis is improved by eliminat- 
ing paths that are invalid based on the procedure call and return structure. Other 
techniques eliminate paths that are infeasible due to branch correlation. In both of 
these cases, the precision of the analysis is improved by eliminating spurious facts 
due to unrealizable paths. Another type of path-based techniques has as its focus 
the improvement of precision in the information produced for certain paths. One 
approach separates particular paths, namely frequently executed paths, to improve 
the precision of the analysis on the separated paths. Other techniques improve the 
precision of the analysis by proving a distributive formulation of non-distributed 
data flow problems. These techniques incorporate information about the quantities 
generated on separate paths into the analysis to enable a more detailed represen- 
tation of the quantities and hence less conservative merging at confluence points. 

The tutorial will conclude by discussing the needs of various applications in terms 
of data flow information and future directions for data flow analysis. 
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